Skip to main content
Top

A comparison of Kaplan–Meier-based inverse probability of censoring weighted regression methods

  • Open Access
  • 28-10-2025
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

This article delves into the comparison of three Kaplan–Meier-based inverse probability of censoring weighted regression methods for survival analysis. The primary focus is on understanding the efficiency and bias of these methods in different scenarios, particularly when dealing with censored data. The article explores the theoretical aspects of each method, providing a framework for their comparison. It also presents simulation studies that illustrate the practical performance of these methods under various conditions, including different censoring distributions and sample sizes. The results highlight how the choice of method can significantly impact the variance and bias of parameter estimates. Additionally, the article discusses the role of stratification in reducing bias and the limitations of standard variance estimators. The findings suggest that the pseudo-observation approach may offer advantages in terms of smaller variance and bias, but its suitability depends on the specific context and data characteristics. Overall, the article provides valuable insights for practitioners seeking to apply these methods in their research.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

In some settings of survival analysis, the primary interest may concern one or a few time points, where measures of interest could be risk or restricted mean survival time. By focusing on one or a few time points, the analyst can easily communicate results. As pointed out by Martinussen and Scheike (2023), a particular time point may have a special role in a specific clinical subject area. For instance, disease-free survival 3 years after treatment for certain cancer patients may be taken to indicate cure of the patient. A relevant tool for adjusted comparisons of the measures of interest between groups in this sort of setting is a regression method such as logistic regression. The issue of right censoring becomes a missing data problem for such regression methods by potentially rendering the outcome unobserved. In the study of treatment effects on 3 year disease-free survival of cancer patients, such a right-censoring issue may, for instance, come about if it is desired to include in the analysis more recently treated patients for whom 3 years of follow-up is not yet available.
This paper is concerned with a regression setting and the handling of outcomes that may be missing due to right censoring. Here, the handling will in some way be by use of inverse probability weighting. The weights are related to the censoring distribution and for this reason such approaches may be considered inverse probability of censoring weighting (IPCW) approaches. On the other hand, this may be considered a misnomer since the weights are, much like sampling weights, estimates of the inverse probabilities of observing the outcomes in question rather than probabilities of censoring. This paper will consider three approaches where the weights are based on Kaplan–Meier estimates of the censoring distribution, either based on the entire sample or calculated in strata. An important assumption will be independent censoring.
One approach involves a Horvitz–Thompson-type weighting of the entire individual contribution to an estimating equation, which has been considered in a survival setting by e.g. Robins and Rotnitzky (1992). A second approach, which was suggested by Scheike et al. (2008) for assessing the influence of covariates on a cumulative incidence curve, involves weighting only the potentially missing outcome for each individual. The third approach that is considered here was introduced by Andersen et al. (2003) and involves replacing the potentially missing outcomes by jack-knife pseudo-values, here based on an inverse probability weighting estimator such as the Kaplan–Meier estimator. All three approaches can be carried out using standard statistical software in a wide range of situations: the software likely facilitates calculation of Kaplan–Meier estimates and regression parameter estimates in, for instance, generalized linear models, perhaps with sampling weights, as well as robust, Huber–White-type standard errors of the regression parameter estimates based on a standard sandwich variance estimator.
The paper by Blanche et al. (2023) studies and compares the first and second approach in a setting where the outcome is having a certain event within a certain time and where a logistic regression is considered. A main conclusion of that paper is that which method is more efficient depends on the censoring distribution. Another conclusion is that a naive approach to variance estimation will lead to too large, or conservative, variance estimates.
The third approach, also known as the pseudo-observation method, is reviewed by Andersen and Pohar Perme (2010) where some further background can be found. Much of the theory on the method that is useful for the purposes of this paper can be found in Overgaard et al. (2019). The paper by Andersen and Pohar Perme (2010) also suggests stratifying calculation on a variable if censoring depends on that variable, which is an approach studied in more detail in this paper.
Although the theory of these three different approaches seems clear, at least in some settings, it is, however, not clear how the three approaches compare, especially how the pseudo-observation method compares with the two other approaches in terms of efficiency. Results by Binder et al. (2014) and Parner et al. (2023) indicate that the pseudo-observation method makes more efficient use of data compared to the outcome weighting approach, but it remains unclear why this is the case and how generally this holds.
In this paper, a comparison of the three approaches is carried out theoretically, in a theoretical example, and in simulations. The theoretical comparison is in terms of the asymptotics of the three approaches in a general setting which is laid out within a common framework that includes stratification of the weight calculation. The topic of naive variance estimation using the standard sandwich variance estimator is considered for the three approaches. Some insights into the biases of the three approaches when the independent censoring assumption is violated are also gained.
In Sect. 2, the setting and main results on, in particular, the asymptotic variances are presented. In Sect. 3, a theoretical example offers insights into how the three approaches compare in a specific setting. In Sect. 4, simulations are used to illustrate, corroborate, and challenge the asymptotic results. The paper ends with a discussion of results in Sect. 5 and an appendix with technical results in section A.

2 Regression analysis with a censored outcome

Suppose Y is an outcome of interest and a model is assumed which states that a parameter vector \(\beta \in \mathbb {R}^p\) exists such that \(\operatorname {E}(Y \mathbin {\mid }X) = \mu (\beta ; X)\) for a certain function \(\mu \) where X is a vector of covariates. It is desired to estimate the true \(\beta \) based on n observations, i.e. n independent replications of that experiment. A standard approach would be to solve an estimating equation of the type
$$\begin{aligned} U_n(\beta ) := \sum _{i=1}^n A(\beta ; X_i) (Y_i - \mu (\beta ; X_i)) = 0 \end{aligned}$$
(1)
for a suitable p-dimensional vector function A. When Y is not always observed due to censoring, this approach can no longer be taken. To be more specific, a competing risks setting is now considered. Suppose Y is a function of a failure time \(T > 0\) and a failure type \(D \in \{1, \dots , d\}\), but determined by a time point \(t>0\). In other words \(Y = y(T \wedge t, D \mathbbm {1}(T \le t))\) for a reasonable function y. Outcomes of this type include
$$\begin{aligned} Y&= \mathbbm {1}(T > t), \quad & \text {survival to time } t. \end{aligned}$$
(2)
$$\begin{aligned} Y&= \mathbbm {1}(T \le t, D = j), \quad & \text {failure of a specific type before time } t. \end{aligned}$$
(3)
$$\begin{aligned} Y&= T \wedge t, \quad & \text {survival time restricted to time } t. \end{aligned}$$
(4)
$$\begin{aligned} Y&= (t - T \wedge t) \mathbbm {1}(D = j), \quad & \text {time lost to a specific failure before } t. \end{aligned}$$
(5)
More generally, T may be some event time and D an event type. Possible models of \(\operatorname {E}(Y\mathbin {\mid }X)\) include examples from generalized linear models: a linear model \(\mu (\beta ; x) = \beta ^{\textsf{T}}x\); a relative or exponential model, \(\mu (\beta ; x) = \exp (\beta ^{\textsf{T}}x)\); or a logistic model \(\mu (\beta ; x) = 1/(1+\exp (-\beta ^{\textsf{T}}x))\). The logistic model is primarily appropriate for the dichotomous outcomes. In all examples, the vector x likely includes a constant term and should more generally be considered a result of a vector function applied to an original set of covariates since it should also be able to hold interaction terms, for instance. In the generalized linear model setting, the function A depends on the choice of link function, which determines the structure of \(\mu \), and on the choice of family. The Gaussian family corresponds to the choice of \(A(\beta ;x) = \frac{\partial }{\partial \beta } \mu (\beta ;x)\). A simple choice is \(A(\beta ;x) = x\) and this is obtained when the link function used is canonical for the specified family.
Next, suppose T and D are not always observed due to censoring at a censoring time \(C > 0\). Instead \(\tilde{T} = T \wedge C\), the observed exit time, and \(\tilde{D} = D \mathbbm {1}(T \le C)\), the observed exit type with 0 denoting censoring, are observed. An outcome available at time t as above is observed if \(C \ge T \wedge t\) and can in this case be written as \(Y = y(\tilde{T} \wedge t, \tilde{D} \mathbbm {1}(\tilde{T} \le t))\). To handle censoring, different weighting approaches are considered. The weights are allowed to depend on covariates only through a categorization Z of the covariates X. If G denotes the conditional survival function of the censoring time C, that is, \(G(s \mathbin {\mid }z) = \operatorname {P}(C > s \mathbin {\mid }Z = z)\), a weight to consider is then
$$\begin{aligned} W = \frac{\mathbbm {1}(C \ge T \wedge t)}{G(T \wedge t- \mathbin {\mid }Z)} = \frac{\mathbbm {1}(\tilde{T} \ge t) + \mathbbm {1}(\tilde{T} < t, \tilde{D} \ne 0)}{G(\tilde{T} \wedge t- \mathbin {\mid }Z)}. \end{aligned}$$
(6)
The exact G may well be unknown, and here an approach where an estimate of the censoring distribution is used instead is considered. In the following, estimation based on n independent replications of this type of experiment where information on \((\tilde{T}, \tilde{D}, X)\) is available is considered. This setting, where censoring times are also censored by event times, allows for the use of the Kaplan–Meier estimator of G within each stratum of Z. In order to handle ties systematically and appropriately according to the described setting where the event time T takes priority over the censoring time C, a slight variation of the Kaplan–Meier estimator will in fact be used and is defined precisely in the appendix in equation (38). This estimate is called \(\hat{G}\). So, the applied weight will instead be
$$\begin{aligned} \hat{W} = \frac{\mathbbm {1}(C \ge T \wedge t)}{\hat{G}(T \wedge t- \mathbin {\mid }Z)} = \frac{\mathbbm {1}(\tilde{T} \ge t) + \mathbbm {1}(\tilde{T} < t, \tilde{D} \ne 0)}{\hat{G}(\tilde{T} \wedge t- \mathbin {\mid }Z)}. \end{aligned}$$
(7)
Three regression approaches based on the weights \(\hat{W}\) are now considered: weighting the individual contribution and solving
$$\begin{aligned} U_{n,\textit{ind}}(\beta ) := \sum _{i=1}^n A(\beta ; X_i) \hat{W}_i (Y_i - \mu (\beta ; X_i)) = 0, \end{aligned}$$
(8)
weighting only the potentially censored outcome and solving
$$\begin{aligned} U_{n,\textit{out}}(\beta ) := \sum _{i=1}^n A(\beta ; X_i) (\hat{W}_i Y_i - \mu (\beta ; X_i)) = 0, \end{aligned}$$
(9)
and replacing the outcome with a jack-knife pseudo-observation, \(\hat{\theta }_i\), from a weighted estimator and solving
$$\begin{aligned} U_{n,\textit{pse}}(\beta ) := \sum _{i=1}^n A(\beta ; X_i) (\hat{\theta }_i - \mu (\beta ; X_i)) = 0. \end{aligned}$$
(10)
The pseudo-observation will be defined as
$$\begin{aligned} \hat{\theta }_i = n \hat{\theta }- (n-1) \hat{\theta }^{(i)} \end{aligned}$$
(11)
where \(\hat{\theta }\) is the overall estimate of \(\operatorname {E}(Y)\),
$$\begin{aligned} \hat{\theta }= \frac{1}{n} \sum _{j=1}^n \hat{W}_j Y_j = \frac{1}{n} \sum _{j=1}^n \frac{\mathbbm {1}(C_j \ge T_j \wedge t)}{\hat{G}(\tilde{T}_j \wedge t- \mathbin {\mid }Z_j)} Y_j, \end{aligned}$$
(12)
and \(\hat{\theta }^{(i)}\) is the estimate obtained by leaving out observation i, which can be written
$$\begin{aligned} \hat{\theta }^{(i)} = \frac{1}{n-1} \sum _{j\ne i} \hat{W}_j^{(i)} Y_j = \frac{1}{n-1} \sum _{j\ne i} \frac{\mathbbm {1}(C_j \ge T_j \wedge t)}{\hat{G}^{(i)}(\tilde{T}_j \wedge t- \mathbin {\mid }Z_j)} Y_j \end{aligned}$$
(13)
if \(\hat{G}^{(i)}\) is the stratified Kaplan–Meier estimate of G where observation i has been left out. Since \(\hat{G}^{(i)}(s \mathbin {\mid }z) = \hat{G}(s \mathbin {\mid }z)\) when observation i does not belong to stratum z, the jack-knife pseudo-observation of the overall estimator of (12) can also be calculated as the jack-knife pseudo-observation of the within-stratum estimate and equals
$$\begin{aligned} \hat{\theta }_i = \hat{W}_i Y_i + \sum _{j\ne i: Z_j = Z_i} (\hat{W}_j - \hat{W}_j^{(i)}) Y_j. \end{aligned}$$
(14)
In other words, the pseudo-observation includes the weighted outcome from before and an additional term that takes a potential influence of the observation on the estimate of the weight into account.
It may be noted that the inverse probability weighted estimates of (12) include as examples the Kaplan–Meier estimate for the outcome \(Y = \mathbbm {1}(T > t)\), the area under the Kaplan–Meier curve for the outcome \(Y = T \wedge t\), and other common estimators of the mentioned examples of outcomes Y. Satten and Datta (2001) established this type of result for the Kaplan–Meier-based estimate of a failure probability.
Generally, the following assumptions on the censoring mechanism are made.
Assumption 1
Within strata of Z, the censoring time C is independent of event time T, type D, and covariates X. Symbolically, https://static-content.springer.com/image/art%3A10.1007%2Fs10985-025-09669-8/MediaObjects/10985_2025_9669_Figa_HTML.gif .
Assumption 2
It is possible to observe the information of interest in all relevant strata of Z. That is, \(G(t- \mathbin {\mid }z) > 0\) for (almost) all z.
The assumptions ensure that W is well defined and that \(\operatorname {E}(W \mathbin {\mid }T, D, X) = 1\), which make the weights suitable for compensating for the missing information. The cumulative censoring hazard can also be defined without issue in the relevant interval as follows. Let \(F_0(s \mathbin {\mid }z) = \operatorname {P}(C \le s \mathbin {\mid }Z=z)\). Define the corresponding cumulative censoring hazard by \(\Lambda (s \mathbin {\mid }z) = \int _0^s G(u- \mathbin {\mid }z)^{-1} F_0(\textrm{d} \,u \mathbin {\mid }z)\) for \(s \le t\). No assumption of continuity of these functions is made.
Other assumptions are made in the following. These assumptions include some usual requirements for the estimating procedure of the uncensored case in (1) to work, and a further positivity assumption, say \(\operatorname {P}(T> t \mathbin {\mid }Z) > 0\) almost surely, to ease the handling of the estimation of the censoring distribution. To be vague, these assumptions are collectively termed regularity conditions in what follows. Some further details on this matter are given in the appendix. The approach presented in the appendix even puts restrictive assumptions on the outcome Y, in particular boundedness. These assumptions are met by the four presented examples of outcomes, but do not seem strictly necessary for the results presented in the following.
Under regularity conditions, the original estimating equation, (1), based on uncensored information has, with a high probability for large n, as solutions consistent and asymptotically normally distributed estimates of the true \(\beta \), which will be denoted \(\beta _0\) in the following. This can be seen from results on Z-estimators, see for instance Chapter 5 of van der Vaart (1998). To be more specific, the estimates will be asymptotically linear with influence function
$$\begin{aligned} \dot{\beta }(T, D, X) = -B(X) (Y - \mu (\beta _0; X)), \end{aligned}$$
(15)
where \(B(X) = J(\beta _0)^{-1} A(\beta _0; X)\) and \(J(\beta ) = \operatorname {E}( - A(\beta ; X) \frac{\partial }{\partial \beta } \mu (\beta ; X))\). This means that
$$\begin{aligned} \sqrt{n}(\hat{\beta }_n - \beta _0) = \sqrt{n}\frac{1}{n} \sum _{i=1}^n \dot{\beta }(T_i, D_i, X_i) + o_{\operatorname {P}}(1), \end{aligned}$$
(16)
so that the asymptotic variance is \(\operatorname {Var}(\dot{\beta }(T, D, X))\).
Under similar regularity conditions, and the assumptions on the censoring mechanism mentioned above, the three weighting approaches similarly have as solutions estimates that are asymptotically linear with a specific influence function. The influence functions compare to the influence function of the uncensored problem in a certain way, as laid out in the theorem below. The notation \(M(s \mathbin {\mid }Z) = \mathbbm {1}(C \le s) - \int _0^s \mathbbm {1}(C \ge u) \Lambda (\textrm{d} \,u \mathbin {\mid }Z)\) will be used for a martingale related to the censoring.
Theorem 1
Under Assumption 1, Assumption 2, and regularity conditions, the three approaches have as solutions consistent and asymptotically normal parameter estimates with influence functions on the form
$$\begin{aligned} \begin{aligned} \dot{\beta }_type (\tilde{T}, \tilde{D}, X)&= \dot{\beta }(T, D, X) \\&\hspace{-6em} + \int _0^{t-} \big (\phi _type (s; T, D, X) - \nu _type (s \mathbin {\mid }Z) \big ) \mathbbm {1}(T > s) \frac{1}{G(s \mathbin {\mid }Z)} M(\textrm{d} \,s \mathbin {\mid }Z), \end{aligned} \end{aligned}$$
(17)
where \(\nu _type (s \mathbin {\mid }Z) = \operatorname {E}(\phi _type (s; T, D, X) \mathbin {\mid }T > s, Z)\). The three types have
$$\begin{aligned} \phi _{ind}(s; T, D, X)&= B(X) (Y- \mu (\beta _0; X)) \end{aligned}$$
(18)
$$\begin{aligned} \phi _{out}(s; T, D, X)&= B(X) Y \end{aligned}$$
(19)
$$\begin{aligned} \phi _{pse}(s; T, D, X)&= B(X) (Y- \operatorname {E}(Y \mathbin {\mid }T > s, Z)). \end{aligned}$$
(20)
The proof can be found in the appendix.
The last term of (17) is structured with a part depending only on the underlying competing risks data and its distribution and a part depending only on the underlying censoring time and its distribution. Under the assumptions, this structure implies a similar structure of the resulting asymptotic variance matrices which makes clear how the variance has been increased by censoring and how the variance depends on the censoring distribution.
Corollary 1
In the setting of Theorem 1, the asymptotic variances \(\Sigma _type = \operatorname {Var}(\dot{\beta }_type (\tilde{T}, \tilde{D}, X))\) can be expressed as
$$\begin{aligned} \Sigma _type = \Sigma + \operatorname {E}\big ( \int _0^{t-} \Phi _type (s \mathbin {\mid }Z) \frac{S(s \mathbin {\mid }Z)}{G(s \mathbin {\mid }Z)} \Lambda (\textrm{d} \,s \mathbin {\mid }Z) \big ) \end{aligned}$$
(21)
where \(\Sigma = \operatorname {Var}(\dot{\beta }(T, D, X))\) is the variance of the uncensored problem, \(\Phi _type (s \mathbin {\mid }Z) = \operatorname {Var}(\phi _type (s; T, D, X) \mathbin {\mid }T > s, Z)\), and \(S(s \mathbin {\mid }z) = \operatorname {P}(T > s \mathbin {\mid }Z = z)\). Concretely,
$$\begin{aligned} \Phi _{ind}(s \mathbin {\mid }Z)&= \operatorname {Var}(B(X) (Y - \mu (\beta _0; X)) \mathbin {\mid }T > s, Z) \end{aligned}$$
(22)
$$\begin{aligned} \Phi _{out}(s \mathbin {\mid }Z)&= \operatorname {Var}(B(X) Y \mathbin {\mid }T > s, Z) \end{aligned}$$
(23)
$$\begin{aligned} \Phi _{pse}(s \mathbin {\mid }Z)&= \operatorname {Var}(B(X) (Y - \operatorname {E}(Y \mathbin {\mid }T> s, Z)) \mathbin {\mid }T > s, Z). \end{aligned}$$
(24)
Proof
The two terms of \(\dot{\beta }_{type }(\tilde{T}, \tilde{D}, X)\) in (17) are uncorrelated owing to the independent censoring assumption: A martingale property applies to \(M(s \mathbin {\mid }Z)\) given the underlying information (TDX) and the last term will have mean 0 in the conditional distribution given (TDX). The martingale property also implies that the variance of the last term is
$$\begin{aligned} \begin{aligned}&\operatorname {E}\big (\int _0^{t-} \big (\phi _type (s; T, D, X) - \nu _type (s \mathbin {\mid }Z) \big )^{\otimes 2} \\&\hspace{4em} \cdot \frac{\mathbbm {1}(T > s) \mathbbm {1}(C \ge s) (1- \Delta \Lambda (s \mathbin {\mid }Z))}{G(s \mathbin {\mid }Z)^2} \Lambda (\textrm{d} \,s \mathbin {\mid }Z) \big ) \end{aligned} \end{aligned}$$
(25)
which reduces to the desired expression under the independent censoring assumption since it is the case that \(\operatorname {E}((\phi _type (s; T, D, X) - \nu _type (s \mathbin {\mid }Z) )^{\otimes 2} \mathbbm {1}(T > s) \mathbin {\mid }Z) = \operatorname {Var}(\phi _type (s; T, D, X) \mathbin {\mid }Z) S(s \mathbin {\mid }Z)\) and \(\operatorname {E}(\mathbbm {1}(C \ge s) \mathbin {\mid }Z) (1- \Delta \Lambda (s \mathbin {\mid }Z)) = G(s- \mathbin {\mid }Z) (1- \Delta \Lambda (s \mathbin {\mid }Z)) = G(s \mathbin {\mid }Z)\). See for instance Chapter II of Andersen et al. (1993) for implications of the martingale property. Above, the notation \(a^{\otimes 2} = a a ^{\textsf{T}}\) is used for a column vector a. \(\square \)
With the asymptotic variances at hand, it is of interest to consider the question of which of the types has the lower asymptotic variance and can therefore be expected to produce the lowest variance in parameter estimates, at least at larger sample sizes. The answer clearly depends on many things. In particular it depends on how the three \(\Phi _type (s \mathbin {\mid }Z)\) compare at various time points s, but also on the censoring hazard at these time points. It is worth pointing out that there is a lower bound to what can be achieved in terms of asymptotic variance of these three approaches. Note how \(\phi _type \) is on the form \(B(X)(Y-f(X))\) for some function f of the covariates for each of the types of approach. For a given s this form has
$$\begin{aligned} \begin{aligned} \operatorname {Var}(B(X) (Y - f(X)) \mathbin {\mid }T> s, Z)&= \operatorname {Var}(\operatorname {E}(B(X) (Y - f(X)) \mathbin {\mid }T> s, X) \mathbin {\mid }T> s, Z) \\&+ \operatorname {E}(\operatorname {Var}(B(X) (Y - f(X)) \mathbin {\mid }T> s, X) \mathbin {\mid }T> s, Z) \\&= \operatorname {Var}(B(X) (\operatorname {E}(Y \mathbin {\mid }T> s, X) - f(X) ) \mathbin {\mid }T> s, Z) \\&+ \operatorname {E}(B(X) \operatorname {Var}(Y \mathbin {\mid }T> s, X) B(X)^{\textsf{T}}\mathbin {\mid }T > s, Z) \end{aligned} \end{aligned}$$
(26)
by the law of total variance. This implies that \(f(x) = \operatorname {E}(Y \mathbin {\mid }T > s, X=x)\) minimizes this conditional variance over functions on this form. In other words, let \(\Phi (s \mathbin {\mid }Z) = \operatorname {Var}(B(X)(Y- \operatorname {E}(Y \mathbin {\mid }T> s, X)) \mathbin {\mid }T > s, Z)\), then \(\Phi _type (s \mathbin {\mid }Z) \ge \Phi (s \mathbin {\mid }Z)\) for each type. The variance expression of (21) now reveals how a comparably low variance among the three different types can be achieved for a type of approach that has \(\Phi _type (s \mathbin {\mid }Z)\) close to \(\Phi (s \mathbin {\mid }Z)\) at time points s where the censoring hazard is high.
Observation 1
Some observations concerning the comparison of the asymptotic variances of the three approaches are the following.
(a)
For the individual weighting approach, \(\mu (\beta _0;X)\) should be close to \(\operatorname {E}(Y \mathbin {\mid }T > s, X)\) for s close to 0 if the model holds. As a consequence, \(\Phi _\textit{ind}(s \mathbin {\mid }Z)\) is close to the lower bound \(\Phi (s \mathbin {\mid }Z)\) for s close to 0. If the censoring hazard is high early and low later, the individual weighting approach should produce a comparably low variance.
 
(b)
The outcome weighting approach should similarly have \(\Phi _\textit{out}(s \mathbin {\mid }Z)\) close to the lower bound \(\Phi (s \mathbin {\mid }Z)\) if \(\operatorname {E}(Y \mathbin {\mid }T > s, X)\) is close to 0. This may happen for instance for the outcome examples of failure and time lost before t if s is close to t. If the censoring hazard is low early on, but high when approaching the time point of interest, t, the outcome weighting approach should produce a comparably low variance for this type of outcome, at least under continuity.
 
(c)
Similarly, the pseudo-observation approach should have \(\Phi _\textit{pse}(s \mathbin {\mid }Z)\) close to the lower bound \(\Phi (s \mathbin {\mid }Z)\) if \(\operatorname {E}(Y \mathbin {\mid }T > s, Z)\) is close to \(\operatorname {E}(Y \mathbin {\mid }T > s, X)\). This can be expected to happen when the stratification, Z, is fine or when the outcome does not depend much on the covariates X. The examples given for the outcome weighting approach equally apply for the pseudo-observation approach. In fact, under continuity, \(\operatorname {E}(Y \mathbin {\mid }T > s, Z)\) is expected to be close to \(\operatorname {E}(Y \mathbin {\mid }T > s, X)\) for s approaching t no matter the outcome type since Y is determined by time t; they are for instance both close to 1 for the survival outcome.
 
(d)
In the case of categorical covariates X, and \(Z=X\), all three approaches have the same asymptotic variance according to (26). In fact, it seems often even the parameter estimates will all be the same according to a result of Sect. 2.5 of Blanche et al. (2023) and a similar property for the pseudo-observations in line with results by Stute and Wang (1994). In this light, we may expect similar asymptotic variances when more general covariates are considered and a fine stratification is used.
 
Overall, the pseudo-observation approach may seem to have the best chance of producing a low variance, but these are of course rather vague observations. An example corroborating this observation is given in Sect. 3.
The outcome of failure is considered in detail by Blanche et al. (2023) and their Corollary 2 states that the difference in asymptotic variance matrix between the individual weighting approach and the outcome weighting approach can be negative definite or positive definite depending on the censoring distribution, which is in line with the observations above. It seems that this conclusion cannot immediately be transferred to, to give an example, the outcome of restricted survival time since \(\operatorname {E}(Y \mathbin {\mid }T > s, X)\) for s approaching t would be close to t rather than 0 in this case. As alluded to above, a similar conclusion can be reached in the comparison of the individual weighting approach and the pseudo-observation approach even for an outcome such as the restricted survival time.
A standard approach to estimating the variance of parameter estimates is to use the corresponding Huber–White-type sandwich variance estimator. For each type, \(U_{n,type }(\beta )\) is on the form \(\sum _{i=1}^n u_{i,n,type }(\beta )\). The corresponding standard sandwich estimate of the asymptotic variance is
$$\begin{aligned} n\left( \frac{\partial }{\partial \beta } U_{n,type }(\beta )\right) ^{-1} \sum _{i=1}^n u_{i,n,type }(\beta ) u_{i,n,type }(\beta )^{\textsf{T}}\left( \frac{\partial }{\partial \beta } U_{n,type }(\beta )^{\textsf{T}}\right) ^{-1} \end{aligned}$$
(27)
evaluated at the corresponding \(\beta \) estimate. This or a similar variance estimate is what tends to be used for producing standard errors of \(\beta \) estimates by statistical program packages when sampling weights are used or when robust standard errors are requested in regression procedures. It is of interest to establish results on the usefulness of this standard approach to variance estimation in the present setting. The next result establishes that the standard sandwich variance estimate will be conservative for large n for all three approaches when the model holds and it thereby extends Proposition 1 of Blanche et al. (2023).
Theorem 2
In the setting of Theorem 1, for each of the three approaches, the standard sandwich variance estimator converges in probability to
$$\begin{aligned} \Sigma _type ' = \Sigma + \operatorname {E}\big ( \int _0^{t-} \Phi _type '(s \mathbin {\mid }Z) \frac{S(s \mathbin {\mid }Z)}{G(s \mathbin {\mid }Z)} \Lambda (\textrm{d} \,s \mathbin {\mid }Z) \big ) \end{aligned}$$
(28)
where \(\Phi _type '(s \mathbin {\mid }Z) = \operatorname {E}(\phi _type (s; T, D, X) \phi _type (s; T, D, X) ^{\textsf{T}}\mathbin {\mid }T > s, Z)\), and consequently, for any of the three types, \(\Sigma _type ' \ge \Sigma _type \) with equality if and only if \(\operatorname {E}(\phi _type (s; T, D, X) \mathbin {\mid }T > s, Z = z) = 0\) for \(\Lambda (\cdot \mathbin {\mid }z)\)-almost all s for almost all z.
The proof of the theorem can be found in the appendix.
Observation 2
A few observations concerning the asymptotic variance and variance estimation are given in the following.
(a)
The requirement for equality does not seem particularly reasonable in applications in any of the three cases, except perhaps for \(\phi _\textit{pse}\) in cases where \(\operatorname {E}(Y \mathbin {\mid }T > s, Z)\) is close to \(\operatorname {E}(Y \mathbin {\mid }T > s, X)\). In the simple case where X itself represents strata and can be obtained from Z, for instance \(X=Z\), equality is generally only obtained for the pseudo-observation approach.
 
(b)
For the individual and outcome weighting approaches, the limit \(\Sigma _{type }'\) corresponds to what \(\Sigma _{type }\) would have been if G had been postulated as the censoring distribution rather than estimated using the Kaplan–Meier estimator. This may be realized by close inspection of the appendix where postulation of the censoring distribution eliminates \(\dot{u}_type \) and thereby \(e_{type }\) of Lemma 6 for these two types. It is perhaps to be expected that an estimating procedure that simply uses presented weights for each individual is unable to pick up on how those weights came about. For the pseudo-observation approach, on the other hand, the fact that the weights are estimated means the pseudo-observations and thereby the standard sandwich variance estimate are impacted.
 
(c)
It should be perfectly possible to estimate the asymptotic variance by estimating the influence function, plugging in the observed data points and evaluating the empirical variance of what is obtained. Such variance estimators have been considered for the individual and outcome weighting approach by Blanche et al. (2023) in their setting and for the pseudo-observation approach by Overgaard et al. (2017). The expression of the influence functions in Theorem 1 is less helpful here, but the results and approaches of the appendix may be. Such an alternative variance estimator may however have its own problems in small samples where estimating the asymptotic variance is less relevant. This was seen in Overgaard et al. (2018) in an example of the pseudo-observation approach.
 
(d)
Suppose for a moment that the censoring assumptions, Assumption 1 and Assumption 2, hold for a certain stratification denoted \(\tilde{Z}\), but that independent censoring, Assumption 1, also holds for a coarser stratification, Z. The independent censoring assumption would imply \(\Lambda (\textrm{d} \,s \mathbin {\mid }\tilde{Z}) / G(s \mathbin {\mid }\tilde{Z}) = \Lambda (\textrm{d} \,s \mathbin {\mid }Z) / G(s \mathbin {\mid }Z)\), symbolically. The law of total variance reveals how \(\operatorname {E}(\Phi _{type }(s \mathbin {\mid }\tilde{Z}) S(s \mathbin {\mid }\tilde{Z}) \mathbin {\mid }Z) = \operatorname {E}(\Phi _{type }(s \mathbin {\mid }\tilde{Z}) \mathbin {\mid }T > s, Z) S(s \mathbin {\mid }Z) \le \Phi _{type }(s \mathbin {\mid }Z) S(s \mathbin {\mid }Z)\) and thereby that a smaller asymptotic variance in (21) is obtained with a finer stratification for the types of individual and outcome weighting. But for these two types, it is also seen that \(\operatorname {E}(\Phi _{type }'(s \mathbin {\mid }\tilde{Z}) S(s \mathbin {\mid }\tilde{Z}) \mathbin {\mid }Z) = \operatorname {E}(\Phi _{type }'(s \mathbin {\mid }\tilde{Z}) \mathbin {\mid }T > s, Z) S(s \mathbin {\mid }Z) = \Phi _{type }'(s \mathbin {\mid }Z) S(s \mathbin {\mid }Z)\) such that the limit of the variance estimator in (28) does not pick up on the advantage. The situation is more unclear for the pseudo-observation approach since \(\phi _{\textit{pse}}\) of (20), somewhat obscured by the choice of notation, depends on the choice of stratification, but it seems a finer stratification will tend to reduce the asymptotic variance \(\Sigma _{type }\) as well as the limit \(\Sigma _{type }'\) if chosen appropriately to try to match \(\operatorname {E}(Y \mathbin {\mid }T > s, Z)\) with \(\operatorname {E}(Y \mathbin {\mid }T > s, X)\).
 
The stated results assume that the regression model \(E(Y \mathbin {\mid }X) = \mu (\beta ; X)\) holds for some \(\beta \), also denoted \(\beta _0\). As can be seen from the results of the appendix, Lemma 5 and Lemma 6 specifically, the results do not change much under misspecification of the regression model. The three approaches will be able to estimate a best fit to the uncensored problem which may be useful in some situations. The best fit will then depend on the covariate distribution. Under misspecification of the regression model, the only real differences to the results stated earlier are that \(\beta _0\) should now refer to the best fit rather than a true \(\beta \) and that \(J(\beta _0)\) and thereby B(X) will have a more complicated expression, which can be found in equation (88) of the appendix. This does not change the observations made above much except perhaps for one observation concerning the individual weighting approach: The \(\operatorname {E}(Y \mathbin {\mid }T > s, X) \) can no longer be expected to be as close to \(\mu (\beta _0; X)\) as before for s close to 0 and so \(\Phi _\textit{ind}(s \mathbin {\mid }Z)\) can no longer be expected to be as close to the lower bound \(\Phi (s \mathbin {\mid }Z)\) as before. Concerning variance estimation, statistical program packages may or may not use an appropriate estimate of the more general and complicated expression of \(J(\beta _0)\) relevant under misspecification of the regression model when producing the standard sandwich variance estimate. If they do, Theorem 2 will apply with the mentioned modifications. If they do not, there will be another source of bias of the variance estimate, which will not be examined more closely here. The discussion concerning the more complicated and the more simple expression of \(J(\beta _0)\) relates to the discussion of whether to use the observed information matrix or an estimate of the expected information matrix under the model.
A source of bias in the regression parameter estimates is violation of the independent censoring assumption. An attempt at examining the bias is made in Proposition 1 of the appendix where a first order approximation of the bias is found for the three approaches. In principle, this gives some insights into how the bias compares between the three approaches, but the result is approximate and the conclusion is not particularly clear. One observation concerns the scenario where https://static-content.springer.com/image/art%3A10.1007%2Fs10985-025-09669-8/MediaObjects/10985_2025_9669_Figb_HTML.gif but not https://static-content.springer.com/image/art%3A10.1007%2Fs10985-025-09669-8/MediaObjects/10985_2025_9669_Figc_HTML.gif holds. Here, Eq. (82) of the appendix reveals how discrepancy between the true and fitted censoring hazard, according to the approximation,
(a)
will not contribute to bias at early time points close to 0 for the individual weighting approach if the regression model holds,
 
(b)
will not contribute to bias at late time points close to t for the outcome weighting approach for the outcome of failure and time lost before t,
 
(c)
will not contribute to bias at late time points close to t for the pseudo-observation approach generally.
 
Overall, it seems the pseudo-observation approach will have the best chance of mitigating bias from this sort of violation of the independent censoring assumption by choosing Z to be predictive of the outcome as well as the censoring.

3 A theoretical example

With an aim of comparing the asymptotic variances precisely, an example in a simple setting is now considered.
Suppose covariates consists of two groups expected to be equal in size, \(\operatorname {P}(X=0) = \operatorname {P}(X=1) = 1/2\). The event time is chosen to follow uniform distributions in the two groups,
$$\begin{aligned} \operatorname {P}(T \le s \mathbin {\mid }X = 1)&= ps, \quad s \in [0, \frac{1}{p}], \\ \operatorname {P}(T \le s \mathbin {\mid }X = 0)&= qs, \quad s \in [0, \frac{1}{q}], \end{aligned}$$
for certain choices of \(p, q \in (0,1)\). The outcome of interest is \(Y = \mathbbm {1}(T \le 1)\), corresponding to a time point of interest \(t=1\). A true model is \(\mu (\beta ; X) = \beta _0 + \beta _1 X\) with \(\beta _0 = q\) and \(\beta _1 = p-q\). Censoring occurs with probability 0.5 at a specific time point s only and this happens independently of other variables.
The three approaches are used for estimation of the \(\beta \) parameters with the choice \(A(\beta ; X)=(1, X)^{\textsf{T}}\). No stratification will be used. The asymptotic variances for the three types are
$$\begin{aligned} \Sigma _type = \Sigma + \Phi _type (s) S(s) \end{aligned}$$
(29)
in this case according to Corollary 1. Differences in asymptotic variances, particularly the signs of differences, are in this way given from differences in \(\Phi _type (s)\).
The matrix \(J(\beta )\) is
$$\begin{aligned} J(\beta ) = \operatorname {E}\Big ( \begin{pmatrix} 1 & X \\ X & X \end{pmatrix} \Big ) = \begin{pmatrix} 1 & \frac{1}{2} \\ \frac{1}{2} & \frac{1}{2} \end{pmatrix} \end{aligned}$$
(30)
such that
$$\begin{aligned} J(\beta )^{-1} = \begin{pmatrix} 2 & -2 \\ -2 & 4 \end{pmatrix}. \end{aligned}$$
(31)
Calculations reveal
$$\begin{aligned} f_1(s):= \operatorname {E}(Y \mathbin {\mid }T> s)&= \frac{(p+q)(1-s)}{2 - ps - qs}, \\ f_2(s):= \operatorname {Cov}(Y, X \mathbin {\mid }T> s)&= \big (\frac{p(1-s)}{1-ps} - \frac{(p+q)(1-s)}{2 - ps - qs} \big ) \frac{1-ps}{2 - ps - qs}, \\ f_3(s) := \operatorname {Cov}(YX, X \mathbin {\mid }T> s)&= \frac{p(1-s)}{2 - ps - qs} \frac{1-qs}{2 - ps - qs}, \\ f_4(s) := \operatorname {Var}(X \mathbin {\mid }T > s)&= \frac{1-ps}{2 - ps - qs} \frac{1-qs}{2 - ps - qs}. \end{aligned}$$
The asymptotic variance of the \(\beta _1\) component is considered by applying the vector \(a = (0, 1)^{\textsf{T}}\). Calculations reveal, for a general a,
$$\begin{aligned} a^{\textsf{T}}\Phi _{\textit{ind}}(s) a&= a^{\textsf{T}}\Phi _{\textit{out}}(s) a + \operatorname {Var}(a^{\textsf{T}}B(X) \mu (\beta ; X) \mathbin {\mid }T> s) \\&\quad - 2 \operatorname {Cov}(a ^{\textsf{T}}B(X) Y, a ^{\textsf{T}}B(X) \mu (\beta ; X) \mathbin {\mid }T> s), \\ a^{\textsf{T}}\Phi _{\textit{pse}}(s) a&= a^{\textsf{T}}\Phi _{\textit{out}}(s) a + f_1(s)^2 \operatorname {Var}(a ^{\textsf{T}}B(X) \mathbin {\mid }T > s) \\&\quad - 2 \operatorname {Cov}(a ^{\textsf{T}}B(X) Y, a ^{\textsf{T}}B(X)) f_1(s). \end{aligned}$$
For the particular choice of vector \(a = (0, 1)^{\textsf{T}}\),
$$\begin{aligned} \operatorname {Var}(a^{\textsf{T}}B(X) \mu (\beta ; X) \mathbin {\mid }T> s)&= 4 (p+q)^2 f_4(s), \\ \operatorname {Cov}(a ^{\textsf{T}}B(X) Y, a ^{\textsf{T}}B(X) \mu (\beta ; X) \mathbin {\mid }T> s)&= 8 (p+q) f_3(s) - 4 (p+q) f_2(s), \\ \operatorname {Var}(a ^{\textsf{T}}B(X) \mathbin {\mid }T> s)&= 16 f_4(s) \\ \operatorname {Cov}(a ^{\textsf{T}}B(X) Y, a ^{\textsf{T}}B(X) \mathbin {\mid }T > s)&= 16 f_3(s) - 8 f_2(s) \end{aligned}$$
and thereby
$$\begin{aligned}&a^{\textsf{T}}(\Phi _{\textit{pse}}(s) - \Phi _{\textit{out}}(s) ) a = 16 f_1(s)(f_1(s) f_4(s) - 2 f_3(s) + f_2(s)) \\&= - 16 \frac{(p+q)(1-s)}{2 - ps - qs} \big ( \frac{q(1-s)}{2 - ps - qs} \big (\frac{1-ps}{2 - ps - qs}\big )^2 + \frac{p(1-s)}{2 - ps - qs} \big (\frac{1-qs}{2 - ps - qs}\big )^2\big ) \end{aligned}$$
which is negative for all \(s \in (0,1)\), indicating an asymptotic advantage of the pseudo-observation approach over the outcome weighting approach in this setting. Also,
$$\begin{aligned} a^{\textsf{T}}(\Phi _{\textit{ind}}(s) - \Phi _{\textit{out}}(s) ) a&= 4 (p+q)^2 f_4(s) - 16 (p+q) f_3(s) + 8 (p+q)f_2(s)) \\&= 4 \frac{p+q}{(2 - ps - qs)^2} \big ( q(s(1-q)-(1-s))(1-ps) \\&\quad + p(s(1-p)-(1-s))(1-qs) \big ). \end{aligned}$$
Certainly for \(s < \min (\frac{1}{2-p},\frac{1}{2-q})\) the difference is negative, in favor of the individual weighting approach over the outcome weighting approach, while for \(s > \max (\frac{1}{2-p},\frac{1}{2-q})\) the difference is positive, in favor of the outcome weighting approach.
The comparison of the pseudo-observation approach and the individual weighting approach may be obtained by subtracting the two expressions from each other. The difference is in favor of the pseudo-observation approach for \(s=1\) and is 0 for \(s=0\). Seemingly, the difference for any \(s \in (0,1)\) will be an intermediate value and so in favor of the pseudo-observation approach over the individual weighting approach.
For the estimation of the intercept parameter, \(\beta _0\), the vector \(a= (1, 0) ^{\textsf{T}}\) is instead considered. With this choice,
$$\begin{aligned} \operatorname {Var}(a^{\textsf{T}}B(X) \mu (\beta ; X) \mathbin {\mid }T> s)&= 4 q^2 f_4(s), \\ \operatorname {Cov}(a ^{\textsf{T}}B(X) Y, a ^{\textsf{T}}B(X) \mu (\beta ; X) \mathbin {\mid }T> s)&= 4 q f_3(s) - 4 q f_2(s), \\ \operatorname {Var}(a ^{\textsf{T}}B(X) \mathbin {\mid }T> s)&= 4 f_4(s), \\ \operatorname {Cov}(a ^{\textsf{T}}B(X) Y, a ^{\textsf{T}}B(X) \mathbin {\mid }T > s)&= 4 f_3(s) - 4 f_2(s), \end{aligned}$$
which leads to
$$\begin{aligned}&a^{\textsf{T}}(\Phi _{\textit{pse}}(s) - \Phi _{\textit{out}}(s) ) a = 4 f_1(s)(f_1(s) f_4(s) - 2 f_3(s) + 2 f_2(s)) \\&= 4 \frac{(p+q)(1-s)^2(1-ps)}{(2 - ps - qs)^3} \big ( (p+q)\frac{1-qs}{2-ps-qs} - 2 q\big ). \end{aligned}$$
This is in favor of the pseudo-observation approach over the outcome weighting approach when and only when \((p+q)(1-qs) < 2q (2-ps-qs)\) which happens when \(s < \frac{3q - p}{q(p+q)}\). That can happen potentially never and potentially always in the interval (0, 1) depending on the values of p and q. Next,
$$\begin{aligned} a^{\textsf{T}}(\Phi _{\textit{ind}}(s) - \Phi _{\textit{out}}(s) ) a&= 4 q ( q f_4(s) - 2 f_3(s) + 2 f_2(s)) \\&= 4 q^2 \frac{1-ps}{2-ps-qs} \frac{s(2-q) - 1}{2-ps-qs} \end{aligned}$$
which is in favor of the individual weighting approach when \(s(2-q)-1 < 0\), that is, when \(s < \frac{1}{2-q}\). With this result, examples can be found where the asymptotic variance of the \(\beta _0\) estimate is smaller for the individual weighting approach than for the pseudo-observations approach. Take for instance \(p=1/2\), \(q=1/6\) and \(s \in (0, 1/(2-\frac{1}{6}) )\).
This simple theoretical example illustrates how examples can be found where either of the three approaches will have the smallest asymptotic variance.

4 Simulations

To gain insights into the behavior of the three approaches in finite samples, a simulation study has been conducted as described in the following. Three separate scenarios are considered in this simulation study, each with different configurations. Each configuration is simulated 10,000 times.
Scenario I—cumulative incidence. This scenario considers the comparison of the cumulative incidence, or risk, in two groups by a risk difference. The purpose is a comparison of the three approaches in a simple setting at different censoring distributions and various sample sizes. The simple setting corresponds to the theoretical example given in Sect. 3. First, the group \(X \in \{0,1\}\) is drawn with equal probability \(\operatorname {P}(X=x) = 0.5\) in the two groups. The time to event, T, is drawn according to \(\operatorname {P}(T \le s \mathbin {\mid }X=x) = p_x s\) for \(s \in (0, 1/p_x)\) where the choices \(p_0 = 1/6\) and \(p_1 = 1/2\) are used. The time point \(t=1\) is the time point of interest in the following, and the outcome \(Y = \mathbbm {1}(T \le 1)\) is the outcome of interest. A simple linear model, \(\mu (\beta ; X) = \beta _0 + \beta _1 X\) is considered such that \(\beta _1\) is the risk difference and the unknown parameter to be estimated. The specification above makes \(\beta _1 = p_1 - p_0 = -1/3\) the true value of the risk difference at the time point of interest. The censoring distribution will be independent of T and X. Three censoring distributions are considered: one where about \({50}\%\) are censored at the early time point 0.2, \(P(C = 0.2) = 0.5\), and the rest remain uncensored; one where about \({50}\%\) are censored at the late time point 0.8, \(P(C = 0.8) = 0.5\), and the rest remain uncensored; and an exponential censoring distribution, \(\operatorname {P}(C > s) = \exp (-s)\). Samples of sizes \(n \in \{50, 100, 200, 400, 800\}\) are considered. The three approaches are used unstratified, that is, the overall Kaplan–Meier estimator of the censoring distribution is used for the weights, and with \(A(\beta ; X) = (1, X)^{\textsf{T}}\). Note that the choice \(\operatorname {P}(C = s) = 0.5\) and the rest remaining uncensored, or \(\operatorname {P}(C \ge 1) = 0.5\), will have \(\Delta \Lambda (s)/G(s) = 0.5/0.5 = 1\) and is able to pick out the remaining integrand in the last part of (21) at the chosen time point s, for instance \(s=0.2\) or \(s=0.8\). In other words, \(\Sigma _type = \Sigma + \Phi _type (s) S(s)\) in this case.
Fig. 1
Observed variances of \(\beta _1\) estimates scaled by n for the three approaches, as well as for the underlying approach based on uncensored data, in each of the three censoring distribution settings in scenario I
Full size image
Table 1
Simulation results of Scenario I: Observed variance in \(\beta _1\) estimates (\(\operatorname {Var}\)), mean of corresponding standard sandwich variance estimates (\(\widehat{\operatorname {Var}}\)), and coverage probability of Wald-type \({95}\%\) confidence intervals based on standard sandwich variance estimates (\(\%\)) for each type of approach (ind, out, and pse) and in each configuration of censoring distribution (Cens) and number of observations (n)
Cens
n
\(\operatorname {Var}_{ind}\)
\(\operatorname {Var}_{out}\)
\(\operatorname {Var}_{pse}\)
\(\widehat{\operatorname {Var}}_{ind}\)
\(\widehat{\operatorname {Var}}_{out}\)
\(\widehat{\operatorname {Var}}_{pse}\)
\(\%_{ind}\)
\(\%_{out}\)
\(\%_{pse}\)
0.2
50
1.55
1.85
1.56
1.47
1.89
1.62
92.6
94.5
94.7
0.2
100
1.51
1.80
1.51
1.47
1.87
1.58
93.8
94.9
95.1
0.2
200
1.50
1.79
1.49
1.47
1.86
1.56
94.1
95.1
95.2
0.2
400
1.46
1.79
1.46
1.46
1.85
1.55
94.7
95.2
95.5
0.2
800
1.48
1.81
1.48
1.46
1.85
1.54
94.9
95.4
95.7
0.8
50
1.26
1.08
1.06
1.21
1.06
1.05
93.7
94.1
94.1
0.8
100
1.20
1.05
1.04
1.19
1.05
1.03
94.7
94.5
94.6
0.8
200
1.20
1.07
1.04
1.18
1.05
1.03
94.7
94.8
94.7
0.8
400
1.18
1.04
1.01
1.18
1.05
1.02
95.0
95.1
95.1
0.8
800
1.18
1.05
1.02
1.17
1.04
1.02
94.8
95.0
94.8
exp
50
1.82
1.78
1.58
1.71
1.78
1.61
92.7
94.2
94.4
exp
100
1.75
1.73
1.53
1.67
1.76
1.56
93.9
94.9
94.8
exp
200
1.69
1.72
1.50
1.65
1.75
1.54
94.3
95.0
95.1
exp
400
1.65
1.71
1.48
1.65
1.74
1.53
94.9
95.2
95.4
exp
800
1.62
1.69
1.46
1.64
1.74
1.52
95.0
95.4
95.3
Observed and estimated variances are scaled by n (asymptotically scaled) for easy comparison
Key results of the simulations in this scenario are summarized in Fig. 1 and Table 1, where the focus is on the variance of \(\beta _1\) estimates. Observed biases for \(\beta _1\) in the approaches seem negligible and are not presented here. As is illustrated in Fig. 1, the observed variance of the parameter estimates is in line with what is suggested by the theory and the theoretical example of Sect. 3, especially for the larger sample sizes: the pseudo-observation approach produces a comparably low variance in this setting in all three censoring distribution configurations; the individual weighting approach produces a comparably low variance when censoring occurs at an early time point; the outcome weighting approach produces a reasonably low variance when censoring occurs at a late time point. It can also be seen how the individual weighting approach eventually produces a lower variance than the outcome weighting approach in the exponential censoring distribution configuration in this setting, which is not immediately clear from the theory already presented. The fact that losing information to censoring results in larger variances of parameter estimates, as seen in Corollary 1, is illustrated by the gap from the observed variances of the three approaches to the observed variance of the underlying approach based on uncensored data. It seems reasonable that the gap is smaller when censoring occurs at a late time point since less information is lost.
Table 1 gives further insights into the variance estimation using the standard, Huber–White-type sandwich variance estimator: In most cases the average variance estimate is larger than the observed variance in parameter estimates, as suggested by Theorem 2. Somewhat surprisingly, the opposite does happen even for larger samples sizes, at least for the individual weighting approach. A calculation reveals that the asymptotic difference is quite small in this case and it can apparently not be expected to show up in simulations at this sample size and number of iterations. Overall, the differences between observed variance and average variance estimate are fairly small, and the corresponding coverage probabilities of Wald-type \({95}\%\) confidence intervals using these variance estimates are quite close to \({95}\%\), at least for the larger sample sizes. At lower sample sizes, some degree of underperformance in terms of coverage probability is seen, particularly for the individual weighting approach.
To sum up this scenario, the pseudo-observation approach generally wins out in terms of producing low variance of the estimates of the important regression parameter in this case, and the standard sandwich variance estimator even produces reasonable variance estimates for the pseudo-observation approach, leading to reasonable coverage probabilities.
Scenario II - restricted mean, a misspecified model, and covariate-dependent censoring. In this scenario, the restricted mean for different values of continuous covariates are compared using differences. The purpose is a comparison of the three approaches with a new outcome type in a more complicated situation where there are more covariates, continuous covariates, misspecification of the regression model, and covariate-dependent censoring. Three independent continuous covariates are considered, \(X=(X_1, X_2, X_3)\) where \(X_1 \sim N(0,1), X_2 \sim U(0,1), X_3 \sim \Gamma (\text {shape } 3, \text {scale } 0.5)\). Given X, the event time T will follow a Weibull distribution with shape parameter 1.5 and rate parameter \(\exp (-2 + X_1 + X_2/6 + X_3/2 + X_2 \cdot X_3/4)^{\frac{1}{1.5}}\). The time point of interest is \(t=1\) and the outcome of interest is then \(Y = T \wedge 1\). The model considered is \(\mu (\beta ; X) = \beta _0 + \beta _1 X_1 + \beta _2 X_2 + \beta _3 X_3\). No attempt will be made to find \(\operatorname {E}(Y \mathbin {\mid }X)\), but it is apparent that \(\mu (\beta ; X)\) is misspecified. Given X, the censoring distribution will be independent of T and follow a Weibull distribution with shape parameter 1.5 and rate parameter \(\exp (-0.5 + X_2)^{\frac{1}{1.5}}\). In particular, the censoring distribution depends on the uniformly distributed \(X_2\). The three approaches are used with \(A(\beta ; X) = (1, X_1, X_2, X_3)^{\textsf{T}}\) and the sample size \(n=1000\) is considered throughout. It is expected that handling the covariate-dependent censoring by stratification will be useful. The stratification variable Z is constructed from \(X_2\) by first choosing a number of strata k, then letting \(Z = j\) when \(j/k < X_2 \le (j+1)/k\) for \(j = 0, \dots , k-1\). The three approaches are considered with \(k \in \{1, 2, 4, 8\}\). The conditional censoring hazard rates are proportional with proportionality factor \(\exp (X_2)\). This implies that the maximal conditional censoring hazard rate differs from the minimal by a factor of \(\exp (1/k)\) within stratum when k strata are considered. These factors are 2.72, 1.65, 1.28, and 1.13 for the considered k. Note that although the model is misspecified, the fits from the three approaches should on average match the best fit from uncensored problem if the censoring mechanism is handled appropriately according to the theory.
Fig. 2
Average parameter estimates for each of three parameters, \(\beta _1\), \(\beta _2\), and \(\beta _3\), for each of the three approaches as well as for the corresponding approach on the underlying uncensored data according to the number of strata used in the estimation of the censoring distribution in scenario II
Full size image
Fig. 3
Observed variances and variances estimated by the standard sandwich variance estimator, both scaled by \(n=1000\), for estimates of each of the three parameters \(\beta _1\), \(\beta _2\), and \(\beta _3\) and for each of the three approaches according to the number of strata used in the estimation of the censoring distribution in scenario II
Full size image
As can be seen from Fig. 2, all three approaches produce parameter estimates that on average resemble the average parameter estimates from the uncensored data when a large degree of stratification is used. The level of resemblance increases with the number of strata. At a low number of strata, the outcome weighting approach produces the worst resemblance and the pseudo-observation approach the best resemblance in this setting. This is in line with an earlier observation that the pseudo-observation approach may have the best chance of having a limit close to the best fit in uncensored data.
The results on variance and variance estimation are presented in Fig. 3. Notably, the outcome weighting approach produces a very large variance in the \(\beta _2\) estimates, at least at a low number of strata, in comparison with the other approaches. As suggested by an earlier observation, the variance in these parameter estimates for the outcome approach does decrease with the number of strata used in the estimation of the censoring distribution. As also suggested by an earlier observation, this decrease is not seen for corresponding standard sandwich variance estimates. A similar pattern is seen for the individual weighting approach for the \(\beta _2\) parameter, but at a much lower level of variance. Overall, the pseudo-observation approach produces the smallest variances of parameter estimates and these observed variances seem well-reflected by corresponding average variance estimates. Perhaps surprisingly, the observed variances almost seem to tend to increase with the number of strata, but with the scale in mind, it must also be fair to claim that the observed variances do not change much with the number of strata. There are similarly many apparent deviations from earlier theoretical and asymptotic observations: Observed variances increasing with the number of strata for the individual and outcome weighting approaches; variance estimates that are on average smaller than the observed variances. It is worth noting that some of these theoretical observations were made under the assumption of a true regression model and independent censoring in strata, which does not hold in this example.
In conclusion, this scenario demonstrates a setting with a total failure of the outcome weighting approach in terms of variance and also the failure of the standard sandwich variance estimator, at least for the individual and outcome weighting approaches.
Scenario III - risk ratios and stratifications in a factorial design. A \(2^5\) factorial design is considered in this scenario with a complete symmetry in the 5 factors. Focus will be on estimation of the risk ratio related to one factor while taking the other factors into account. The purpose is a comparison of the three approaches in low sample sizes and with a potentially heavy degree of stratification. There is a potential for \(2^5 = 32\) strata. Let a stratum be given by \(x = (x_1, x_2, \dots , x_5)\) where \(x_1, \dots , x_5 \in \{0,1\}\) denote the 5 factors. In such a given stratum, T is drawn from a uniform distribution on \((0, 1/(0.1 \cdot 1.25^{x_1 + x_2 + \cdots + x_5}))\), and the outcome \(Y = \mathbbm {1}(T \le 1)\) is considered. The expectation is \(\operatorname {E}(Y \mathbin {\mid }X=x) = 0.1 \cdot 1.25^{x_1 + \cdots + x_5}\). For the estimation procedure, consider the model specified by \(\mu (\beta ; x) = \exp (\beta _0 + \beta _1 \cdot x_1 + \cdots + \beta _5 \cdot x_5)\). This means the true values have \(\exp (\beta _j) = 1.25\) for \(j=1, \dots , 5\). The parameter \(\beta _1\) is considered of primary interest. The censoring time is drawn, independently from T, from a uniform distribution on (0, 5/3). Scenarios with 2, 6, and 12 observations per stratum are considered. In other words, \(n \in \{64, 192, 384\}\). The three approaches are used with \(A(\beta ; X) = \frac{\partial }{\partial \beta } \mu (\beta ; X)\) and stratification on either no factors, the first factor, the first three factors, and on all five factors. Note that the leave-one-out calculations of the pseudo-observation approach are carried out at their lower limit of sample size in the case of stratification on all factors and only 2 observations per stratum.
Table 2
Convergence percentage (pc), coverage probability of Wald-type \({95}\%\) confidence intervals based on the standard sandwich variance estimate (%), median scaled standard sandwich variance estimate (\(\widehat{\operatorname {Var}}\)), and a scaled variance expression based on the median absolute deviation of parameter estimates (\(\operatorname {Var}\)) for each of the three approaches for each of the configurations of used strata in censoring distribution estimation (k) and number of observations per strata (n/32)
k
n/32
\(pc_{ind}\)
\(pc_{out}\)
\(pc_{pse}\)
\(\%_{ind}\)
\(\%_{out}\)
\(\%_{pse}\)
\(\widehat{\operatorname {Var}}_{ind}\)
\(\widehat{\operatorname {Var}}_{out}\)
\(\widehat{\operatorname {Var}}_{pse}\)
\(\operatorname {Var}_{ind}\)
\(\operatorname {Var}_{out}\)
\(\operatorname {Var}_{pse}\)
0
2
73.9
89.5
91.4
81.0
84.9
89.9
71.5
97.0
98.8
166.2
135.8
117.2
0
6
99.7
100.0
99.9
94.0
97.7
97.7
70.9
69.7
63.0
73.6
55.6
51.3
0
12
100.0
100.0
100.0
96.7
97.2
97.2
49.5
46.8
43.5
44.7
38.6
35.7
1
2
73.6
89.7
91.4
80.5
83.7
90.3
71.5
100.0
103.7
171.4
145.3
127.2
1
6
99.8
100.0
100.0
94.9
98.1
97.8
71.0
70.7
64.1
71.0
54.7
51.6
1
12
100.0
100.0
100.0
97.1
97.3
97.0
50.1
47.0
43.7
45.3
39.4
38.5
3
2
73.1
89.0
90.6
80.9
82.6
86.6
72.6
102.7
104.2
169.4
174.4
173.7
3
6
99.6
99.9
99.9
94.9
97.9
97.7
72.2
75.6
70.2
70.4
58.8
57.8
3
12
100.0
100.0
100.0
97.2
97.3
96.9
49.6
47.9
44.9
42.8
39.3
39.8
5
2
77.3
88.6
81.8
83.8
87.3
73.0
55.3
105.0
66.1
92.7
123.8
291.7
5
6
99.6
99.9
100.0
93.0
98.0
89.3
77.5
92.5
106.8
95.7
78.0
129.9
5
12
100.0
100.0
100.0
96.8
97.5
96.7
52.7
51.6
54.4
45.1
42.9
43.7
The results of Scenario III are summarized in Table 2. Due to possible non-convergence for this estimation problem at small sample sizes, the convergence percentage is reported. The remaining statistics concern the select replications where convergence is achieved for all three approaches. Convergence is required to happen in 20 iterations. Due to possible outliers of parameter estimates and variance estimates, the median variance estimate scaled by n and the scaled variance expression \(\frac{n}{\Phi ^{-1}(3/4)^2} \operatorname {MAD}^2\) based on the median absolute deviation of parameter estimates, \(\operatorname {MAD}\), and standard normal cumulative distribution function, \(\Phi \), are presented as robust alternatives to observed means and variances.
The individual weighting approach is seen to have the most convergence problems in small sample sizes in this setting. The coverage probabilities are generally too low for the three approaches at the small sample sizes, while they are slightly too large at the largest sample size. The pseudo-observation approach seems to have the largest problems in terms of coverage with a large degree of stratification at lower sample sizes. As suggested by the coverage probabilities, the summary statistics for variance and variance estimation can be quite off in settings with small sample sizes. At the largest sample size, the variance estimate summary is larger than the variance expression for the observed parameter estimates, which is in line with the theoretical observations made earlier. At the largest sample size and the lower degree of stratification, the pseudo-observation approach is producing the lowest numbers in this setting, whereas the outcome weighting approach produces the lowest numbers at a larger degree of stratification. Overall, nothing seems to be gained in terms of variance for either of the three approaches by applying a larger degree of stratification in this setting.
As a conclusion, this scenario has certainly crossed the line into territory where neither of these approaches work well. The scenario gives an example where the individual weighting approach seems more likely not to achieve convergence, and an example where the pseudo-observation approach is more vulnerable to too much stratification in the estimation of the censoring distribution. It also gives an example of a setting where not much is gained by applying stratification.

5 Discussion

This paper has extended some of the results presented in Sect. 2 of Blanche et al. (2023): In addition to the individual and outcome weighting approaches, called IPCW-GLM and OIPCW by Blanche et al. (2023) respectively, the pseudo-observation approach is now also considered; the type of outcome and type of model is now more general and not restricted to the logistic model; more results on the consequences of stratification in the estimation of the censoring distribution are given; and the presented results do not depend on continuity of the involved distributions, which may be useful when emulating expressions empirically. A main point is also an extension of a result from Blanche et al. (2023): Neither of the three approaches will generally have the lowest asymptotic variance. Which of the approaches will produce the lowest variance of parameter estimates will depend on the setting. The expressions given in this paper should help judging which one will win out. If censoring primarily happens early on in the time interval, this should favor the individual weighting approach, while if censoring primarily happens late in the time interval, this should favor the outcome weighting and the pseudo-observation approaches. Theoretical observations and simulations in this paper suggest the pseudo-observation approach may tend to have the most advantages such as smaller variance of parameter estimates and smaller bias from a misspecified censoring distribution. If it is of interest to use the standard sandwich variance estimator, this variance estimator also seems to be most appropriate for the pseudo-observation approach. On the other hand, the pseudo-observation approach can be expected to be the most time-consuming approach, and it has also been seen to work less well when only few observations are available per stratum. Additionally, the simulation settings used in this paper may favor the pseudo-observation approach by design by having a larger censoring rate, or rather \( \Lambda (\textrm{d} \,s)/G(s)\), at later time points rather than earlier time points. This may be realistic, but is of course not guaranteed in applications. One simulation scenario saw the outcome weighting approach producing a tremendously large variance in parameter estimates in a setting concerning restricted event time, \(T \wedge t\). The theory would suggest this happens because \(\operatorname {Var}(B(X) (T \wedge t) \mathbin {\mid }T > s, Z)\) is large for s close to t. Arguably, this is a problem with the outcome definition rather than the outcome weighting approach. A similar model could be studied by focusing on, and weighting, the outcome \(t - T \wedge t\), the time lost before t. The model could, in principle, be changed along the same lines to \(\tilde{\mu }(\beta ; X) = t - \mu (\beta ; X)\) without changing the uncensored problem. With this outcome, \(\operatorname {Var}(B(X) (t - T \wedge t) \mathbin {\mid }T > s, Z)\) would be small for s close to t, and is expected to resolve the issue although this was not pursued here. This issue did not arise in Scheike et al. (2008) where focus was on risk, but it may be worthwhile having in mind when using the outcome weighting approach with other types of outcomes.
There are indications in the theory that stratification should be helpful for reducing bias in regression parameter estimates from covariate-dependent censoring and that a high degree of stratification may help reduce the asymptotic variance even when independent censoring is achieved at a lower degree of stratification. The simulation scenario II, however, demonstrates how reducing the bias by increasing the degree of stratification may come at the cost of a larger variance in regression parameter estimates. For instance, estimation of the parameter \(\beta _2\) may preferably be carried out by using the pseudo-observation approach with a low degree of stratification in that scenario if a low mean squared error is desired relative to the best fit in the uncensored problem. It is then very difficult to provide any general practical guidance on stratification granularity based on the results presented here. In the specific setting of simulation scenario II it may be worth considering that the within-stratum maximal to minimal censoring hazard ratio was 2.72 for one stratum, 1.65 for two strata, 1.28 for four strata, and 1.13 for eight strata. If the resemblance of estimates to the best fit of the uncensored problem is deemed to be sufficient at, for instance, two strata and four strata for the individual weighting approach and the outcome weighting approach respectively, these two choices would then correspond to a maximal within-strata relative difference in censoring hazard rates of \({65}\%\) and \({28}\%\), respectively, in that scenario. On the other hand, simulation scenario III gives an example with \(n=192\) and five parameters to be estimated where performance in terms of coverage probabilities seems to drop markedly between \(6 \cdot 2^{5-3} = 24\) observations per stratum and 6 observations per stratum for the individual weighting approach and the pseudo-observation approach. In other words, there is certainly a limit to how few observations per stratum it is possible to work with.
A theme of this paper has been the bias of the standard sandwich variance estimator as seen in Theorem 2. The bias is upwards under the assumptions and leads to conservative inference if confidence intervals and statistical tests are based on this variance estimator. This applies to all three approaches. In simulation scenario II, the variance estimate was seen to be much too large for the outcome weighting approach in some settings. In the paper by Blanche et al. (2023), a simulation setting reveals considerable bias of the variance estimator in the individual weighting approach. The bias of the variance estimator in the pseudo-observation approach was studied by Overgaard et al. (2018) where the bias was found to be large in extreme cases, but tolerable in many less extreme cases. This seems to be in line with the simulation results of this paper. Generally, it should be possible to construct a more asymptotically appropriate variance estimate by emulating the asymptotic variance expression empirically. Suggestions along these lines are given in Appendix A.2 of Blanche et al. (2023) for the individual and outcome weighting approaches and in the end of Sect. 3 of Overgaard et al. (2017) for the pseudo-observation approach. These variance estimators have not been studied in this paper.
The three approaches studied in this paper are rather simple and should not generally be expected to be efficient. A possible refinement is to consider an actual model of the censoring distribution and using fits from such a model in the weight. This is in fact considered by for instance (Robins and Rotnitzky 1992) for the individual weighting approach, by Scheike et al. (2008) for the outcome weighting approach, and by Binder et al. (2014) for the pseudo-observation approach. By better fitting the censoring distribution, a reduction in asymptotic variance of the regression parameter estimates can be expected. Such approaches may also be augmented by including additional terms in the estimating equation that are designed to reduce the asymptotic variance. Including such terms tends to require a working model for the event time and type as well. Such an approach is also considered by Blanche et al. (2023), citing Robins and Rotnitzky (1992) and Bang and Tsiatis (2000). It is interesting how the augmented approach by Blanche et al. (2023) results in the same asymptotic variance for both types as stated in their Proposition 2. On closer inspection of the augmentation term it can be seen that it achieves reducing \(\Phi _{type }(s \mathbin {\mid }Z)\) to \(\Phi (s \mathbin {\mid }Z)\), in the notation of Observation 1, for both types. As discussed by Blanche et al. (2023), it may in practice be difficult to estimate the augmentation term precisely and this task is further complicated if modeling choices are required to be prespecified. Another refined approach is considered in Martinussen and Scheike (2023) where the observed data efficient influence function is emulated directly in a setting concerning risk regression. Recently, an approach based on so-called censoring unbiased transformations was suggested by Sandqvist (2024), and this approach can in this context best be seen as a refinement of the pseudo-observation approach using working models of the censoring and the outcome to calculate an improved pseudo-observation while obtaining double robustness and oracle efficiency properties. It has been beyond the scope of this paper to study these approaches more closely. In contrast, the focus has here been on simple methods that can be applied with access only to standard tools for statistical analysis, such as tools for the Kaplan–Meier estimator, estimation of parameters in a generalized linear model, and the standard sandwich variance estimator. The assumptions considered do not seem overly restrictive. The most restrictive assumptions are likely the ones imposed on the censoring mechanism. These assumptions can be made less restrictive by appealing to stratification in the censoring distribution estimation. It would be of interest to study whether a considerable amount of efficiency can in fact be gained by applying the more advanced approaches mentioned above, or if it is possible in many settings to achieve a reasonable amount of efficiency by applying an appropriate amount of stratification.

Acknowledgements

The author would like to thank Jan Pedersen for helpful discussions on the subject and insightful comments on drafts of this paper.

Declarations

Conflict of interest

The author has no conflict of interest to declare.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Download
Title
A comparison of Kaplan–Meier-based inverse probability of censoring weighted regression methods
Author
Morten Overgaard
Publication date
28-10-2025
Publisher
Springer US
Published in
Lifetime Data Analysis / Issue 4/2025
Print ISSN: 1380-7870
Electronic ISSN: 1572-9249
DOI
https://doi.org/10.1007/s10985-025-09669-8

Appendix

A Technical results in a broader setting

In this appendix, some technical results that will help to establish the primary results of this paper are considered. Sufficient regularity conditions, including positivity, are imposed, but it is of interest not to invoke the independent censoring assumption at this stage. This means that the estimators considered may not be consistent for their intended estimand. This is specifically the case for the estimates of \(\Lambda \) and G, related to the conditional censoring distribution. As is made clear below, the modified Nelson–Aalen estimate of \(\Lambda (\cdot \mathbin {\mid }z)\) uses empirical estimates of \(\check{S}(s \mathbin {\mid }z) = \operatorname {P}(T > s, C \ge s \mathbin {\mid }Z = z)\) and \(\tilde{F}_0(s \mathbin {\mid }z) = \operatorname {P}(\tilde{T} \le s, \tilde{D} = 0 \mathbin {\mid }Z = z)\), and the limits of the estimates of \(\Lambda \) and G are therefore instead
$$\begin{aligned} \Lambda ^*(s \mathbin {\mid }z)&= \int _0^s \frac{1}{\check{S}(u \mathbin {\mid }z)} \tilde{F}_0(\textrm{d} \,u \mathbin {\mid }z), \end{aligned}$$
(32)
https://static-content.springer.com/image/art%3A10.1007%2Fs10985-025-09669-8/MediaObjects/10985_2025_9669_Equ33_HTML.png
(33)
at least under a condition of positivity, \(\check{S}(s \mathbin {\mid }z) > 0\) for (almost) all z. In the following, the notation \({\mathscr{X}} = (T, D, Z, X)\) is used for the underlying information, \(\tilde{\mathscr{X}} = (\tilde{T}, \tilde{D}, Z)\) is used for the information used in estimation of the weights, and \(\bar{{\mathscr{X}}} = (\tilde{T}, \tilde{D}, Z, X)\) is used for the observed information used in the estimating equation. Also, let
$$\begin{aligned} M(\tilde{{\mathscr{X}}}; s \mathbin {\mid }z) = N(\tilde{{\mathscr{X}}}; s; z) - \int _0^s R(\tilde{{\mathscr{X}}}; u; z) \Lambda ^*(\textrm{d} \,u \mathbin {\mid }z), \end{aligned}$$
(34)
where
$$\begin{aligned} N(\tilde{{\mathscr{X}}}; s; z)&= \mathbbm {1}(\tilde{T} \le s, \tilde{D} = 0, Z = z), \end{aligned}$$
(35)
$$\begin{aligned} R(\tilde{\mathscr{X}}; s; z)&= \mathbbm {1}(\tilde{T} > s, Z = z) + \mathbbm {1}(\tilde{T} = s, \tilde{D}=0, Z=z). \end{aligned}$$
(36)
The notation \(p_Z(z) = \operatorname {P}(Z=z)\) is used for the distribution of Z and \(\tilde{S}(s \mathbin {\mid }z) = \operatorname {P}(\tilde{T} > s \mathbin {\mid }Z=z)\) is used for the conditional distribution of \(\tilde{T}\). It may be noted that \(\tilde{S}(s \mathbin {\mid }z) = \check{S}(s \mathbin {\mid }z) (1 - \Delta \Lambda ^*(s \mathbin {\mid }z))\) without invoking further assumptions. To be more explicit on the estimation of the weights, consider data on n replications and let \(\bar{N}(s; z) = \frac{1}{n} \sum _{i=1}^n N(\tilde{\mathscr{X}}_i; s; z)\) and \(\bar{R}(s; z) = \frac{1}{n} \sum _{i=1}^n R(\tilde{\mathscr{X}}_i; s; z)\). Then the estimates of the cumulative censoring hazard and the censoring survival function are given by
$$\begin{aligned}&\hat{\Lambda }(s \mathbin {\mid }z) = \int _0^s \frac{1}{\bar{R}(u; z)} \bar{N}(\textrm{d} \,u; z), \end{aligned}$$
(37)
https://static-content.springer.com/image/art%3A10.1007%2Fs10985-025-09669-8/MediaObjects/10985_2025_9669_Equ38_HTML.png
(38)
A useful result for approximating the applied weights is the following.
Lemma 1
For a given z with \(p_Z(z) > 0\), it holds, for any s such that \(\tilde{S}(s \mathbin {\mid }z) > 0\), that
$$\begin{aligned} \sup _{u \in [0,s]} \big |\frac{1}{\hat{G}(u \mathbin {\mid }z)} - \frac{1}{G^*(u \mathbin {\mid }z)} + \frac{1}{n} \sum _{i=1}^n \frac{1}{G^*(u \mathbin {\mid }z)} \frac{\dot{G}(\tilde{\mathscr{X}}_i; u \mathbin {\mid }z)}{G^*(u \mathbin {\mid }z)}\big | = o_{\operatorname {P}}(n^{-1/2}) \end{aligned}$$
(39)
where
$$\begin{aligned} \dot{G}(\tilde{\mathscr{X}}; u \mathbin {\mid }z) = - G^*(u \mathbin {\mid }z) \int _0^u \frac{1}{\tilde{S}(v \mathbin {\mid }z) p_Z(z)} M(\tilde{\mathscr{X}}; \textrm{d} \,v \mathbin {\mid }z) \end{aligned}$$
(40)
Proof
One way of proving this result is by taking a functional approach similar to the approach of Overgaard et al. (2019). Viewing the estimator as an application of a functional between spaces of functions of bounded p-variation to the empirical distribution of the data on \(\tilde{\mathscr{X}}\), the result is obtained by finding the derivative of the functional.
For any distribution function, F, of \(\tilde{\mathscr{X}}\), at risk and censoring indicating functionals defined by
$$\begin{aligned} R(F; s; z)&= \int \mathbbm {1}(T > s, C \ge s, Z = z) \textrm{d} \,F, \\ N(F; s; z)&= \int \mathbbm {1}(\tilde{T} \le s, \tilde{D} = 0, Z = z) \textrm{d} \,F, \end{aligned}$$
that take the expectation with respect to that distribution, can be considered. Next, a corresponding cumulative censoring hazard can be defined by \(\Lambda (F; s \mathbin {\mid }z) = \int _0^s \frac{1}{R(F; u; z)} N(F; \textrm{d} \,u ; z)\), and the corresponding censoring distribution is given by https://static-content.springer.com/image/art%3A10.1007%2Fs10985-025-09669-8/MediaObjects/10985_2025_9669_Figd_HTML.gif . Finally, a functional defined by
$$\begin{aligned} \phi (F; s \mathbin {\mid }z) = \frac{1}{G(F; s \mathbin {\mid }z)} \end{aligned}$$
(41)
will result in \(1/\hat{G}( \cdot \mathbin {\mid }z)\) when applied to the empirical distribution, and it may be noted that \(1/G^*( \cdot \mathbin {\mid }z)\) is obtained when \(\phi \) is applied to the true distribution. A derivative on the form
$$\begin{aligned} \phi _F'(f; s \mathbin {\mid }z) = - \frac{1}{G(F; s \mathbin {\mid }z)} \frac{G_F'(f; s \mathbin {\mid }z)}{G(F; s \mathbin {\mid }z)} \end{aligned}$$
(42)
is expected, where, according to differentiability results on the product integral,
$$\begin{aligned} G_F'(f; s \mathbin {\mid }z) = - G(F; s \mathbin {\mid }z) \int _0^s \frac{1}{1- \Delta \Lambda (F; u \mathbin {\mid }z)} \Lambda _F'(f; \textrm{d} \,u \mathbin {\mid }z) \end{aligned}$$
(43)
and, owing to bilinearity of the integral and linearity of R and N,
$$\begin{aligned} \Lambda _F'(f; s \mathbin {\mid }z) = \int _0^s \frac{1}{R(F; u; z)} N(f; \textrm{d} \,u; z) - \int _0^s \frac{R(f; u; z)}{R(F; u; z)^2} N(F; \textrm{d} \,u \mathbin {\mid }z). \end{aligned}$$
(44)
With these expressions in hand, it can be seen that the result follows if it can be argued that \(\phi (F_n; \cdot \mathbin {\mid }z)\) is sufficiently close to \(\phi (F_0; \cdot \mathbin {\mid }z) + \phi _{F_0}'(F_n - F_0; \cdot \mathbin {\mid }z)\) for large n where \(F_n\) is the empirical distribution and \(F_0\) the true distribution. Note for instance how \(\Lambda _{F_0}'(F_n - F_0; s \mathbin {\mid }z) = \frac{1}{n} \sum _i \int _0^s \frac{1}{R(F_0;u;z)} M(\tilde{\mathscr{X}}_i; \textrm{d} \,u \mathbin {\mid }z)\) and \((1- \Delta \Lambda (F_0; u \mathbin {\mid }z))R(F_0;u;z) = (1- \Delta \Lambda ^*(u \mathbin {\mid }z)) \check{S}(u \mathbin {\mid }z) p_Z(z) = \tilde{S}(u \mathbin {\mid }z) p_Z(z)\). Under the stated assumptions, the convergence result in p-variation over [0, s]
$$\begin{aligned} \Vert \phi (F_n; \cdot \mathbin {\mid }z) - \phi (F_0; \cdot \mathbin {\mid }z) - \phi _{F_0}'(F_n - F_0; \cdot \mathbin {\mid }z)\Vert _{[p]} = O_{\operatorname {P}}(n^{2\frac{1-p}{p}}) \end{aligned}$$
(45)
holds for \(p \in (1, 2)\) and so the same convergence order holds in supremum norm over [0, s]. The result is based on the convergence order \(\Vert F_n - F_0\Vert = O_{\operatorname {P}}(n^{(1-p)/p})\) in a norm based on p-variation and that \(\phi \) is more than once continuously differentiable, yielding a first order remainder of order \(O(\Vert F_n - F_0\Vert ^2)\) as \(F_n\) approaches \(F_0\). The required convergence order is obtained for \(p > 4/3\), but faster convergence orders, almost \(O_{\operatorname {P}}(n^{-1})\), can also be obtained by this argument by considering p close to 2. See Overgaard et al. (2019) for further details on the choice of norm and the differentiability results. \(\square \)
Lemma 2
Assuming \(\tilde{S}(t \mathbin {\mid }Z) > 0\) almost surely, the estimator
$$\begin{aligned} \hat{\theta }= \frac{1}{n} \sum _{i=1}^n \hat{W}_i Y_i \end{aligned}$$
(46)
has first order influence function
$$\begin{aligned} \dot{\theta }(\tilde{\mathscr{X}}) = W^*Y - \operatorname {E}(W^* Y) + \int _0^{t-} e^*(u \mathbin {\mid }Z) M(\tilde{\mathscr{X}}; \textrm{d} \,u \mathbin {\mid }Z) \end{aligned}$$
(47)
and second order influence function
$$\begin{aligned} \begin{aligned} \ddot{\theta }(\tilde{\mathscr{X}}_1, \tilde{\mathscr{X}}_2)&= \int _0^{t-} \big (W_1^* Y_1 \frac{\mathbbm {1}(\tilde{T}_1> s) \mathbbm {1}(Z_1=Z_2)}{\tilde{S}(s \mathbin {\mid }Z_2) p_Z(Z_2)} \\&\hspace{-2em} - e^*(s \mathbin {\mid }Z_2) \big (\frac{R(\tilde{\mathscr{X}}_1; s; Z_2)}{\check{S}(s \mathbin {\mid }Z_2) p_Z(Z_2)} - \dot{\Lambda }(\tilde{\mathscr{X}}_1; s \mathbin {\mid }Z_2) \big ) \big ) M(\tilde{\mathscr{X}}_2; \textrm{d} \,s \mathbin {\mid }Z_2) \\&+ \int _0^{t-} \big (W_2^* Y_2 \frac{\mathbbm {1}(\tilde{T}_2 > s) \mathbbm {1}(Z_2=Z_1)}{\tilde{S}(s \mathbin {\mid }Z_1) p_Z(Z_1)} \\&\hspace{-2em} - e^*(s \mathbin {\mid }Z_1) \big (\frac{R(\tilde{\mathscr{X}}_2; s; Z_1)}{\check{S}(s \mathbin {\mid }Z_1) p_Z(Z_1)} - \dot{\Lambda }(\tilde{\mathscr{X}}_2; s \mathbin {\mid }Z_1) \big ) \big ) M(\tilde{\mathscr{X}}_1; \textrm{d} \,s \mathbin {\mid }Z_1) \\ \end{aligned} \end{aligned}$$
(48)
where \(W^* = \mathbbm {1}(C \ge T \wedge t)/G^*(T \wedge t- \mathbin {\mid }Z)\) and \(e^*(s \mathbin {\mid }z) = \operatorname {E}(W^*Y \mathbin {\mid }\tilde{T} > s, Z = z)\) and
$$\begin{aligned} \dot{\Lambda }(\tilde{\mathscr{X}}; s \mathbin {\mid }z) = \int _0^s \frac{1}{\tilde{S}(u \mathbin {\mid }z) p_Z(z)}M(\tilde{\mathscr{X}}; \textrm{d} \,s \mathbin {\mid }z). \end{aligned}$$
(49)
Proof
In similarity to the proof of Lemma 1, it is possible to use a functional approach, and the functionals defined in that proof are reused here. Specifically, it is possible to see the estimator as an evaluation of a functional
$$\begin{aligned} \theta : F \mapsto \int W(F) Y \textrm{d} \,F = \int \frac{\mathbbm {1}(C \ge T \wedge t)}{G(F; \tilde{T} \wedge t- \mathbin {\mid }Z)} Y \textrm{d} \,F \end{aligned}$$
(50)
at the empirical distribution of \(\tilde{\mathscr{X}}\). The first order derivative of the functional \(\theta \) is given by
$$\begin{aligned} \theta _F'(f) = \int W(F) Y \textrm{d} \,f + \int W_F'(f) Y \textrm{d} \,F \end{aligned}$$
(51)
while the second order derivative is given by
$$\begin{aligned} \theta _F''(f, g) = \int W_F'(g) Y \textrm{d} \,f + \int W_F'(f) Y \textrm{d} \,g + \int W_F''(f, g) Y \textrm{d} \,F. \end{aligned}$$
(52)
Here,
$$\begin{aligned} W_F'(f) = -\frac{\mathbbm {1}(C \ge T \wedge t)}{G(F; \tilde{T} \wedge t- \mathbin {\mid }Z)} \frac{G_F'(f; \tilde{T} \wedge t- \mathbin {\mid }Z)}{G(F; \tilde{T} \wedge t- \mathbin {\mid }Z)} \end{aligned}$$
(53)
and
$$\begin{aligned} \begin{aligned} W_F''(f,g)&= 2 \frac{\mathbbm {1}(C \ge T \wedge t)}{G(F; \tilde{T} \wedge t- \mathbin {\mid }Z)} \frac{G_F'(f; \tilde{T} \wedge t- \mathbin {\mid }Z)}{G(F; \tilde{T} \wedge t- \mathbin {\mid }Z)} \frac{G_F'(g; \tilde{T} \wedge t- \mathbin {\mid }Z)}{G(F; \tilde{T} \wedge t- \mathbin {\mid }Z)} \\&- \frac{\mathbbm {1}(C \ge T \wedge t)}{G(F; \tilde{T} \wedge t- \mathbin {\mid }Z)} \frac{G_F''(f, g; \tilde{T} \wedge t- \mathbin {\mid }Z)}{G(F; \tilde{T} \wedge t- \mathbin {\mid }Z)}. \end{aligned} \end{aligned}$$
(54)
At this stage it may be worthwhile to introduce a functional by
$$\begin{aligned} \Gamma _F(f; s \mathbin {\mid }z) = -\frac{G_F'(f; s \mathbin {\mid }z)}{G(F; s \mathbin {\mid }z)} = \int _0^s \frac{1}{1- \Delta \Lambda (F; u \mathbin {\mid }z)} \Lambda _F'(f; \textrm{d} \,u \mathbin {\mid }z). \end{aligned}$$
(55)
Note that
$$\begin{aligned} \Lambda _F''(f,g; s \mathbin {\mid }z) = - \int _0^s \frac{R(g; u; z)}{R(F; u; z)} \Lambda _F'(f; \textrm{d} \,u \mathbin {\mid }z) - \int _0^s \frac{R(f; u; z)}{R(F; u; z)} \Lambda _F'(g; \textrm{d} \,u \mathbin {\mid }z) \end{aligned}$$
(56)
such that, in a short notation,
https://static-content.springer.com/image/art%3A10.1007%2Fs10985-025-09669-8/MediaObjects/10985_2025_9669_Equ57_HTML.png
(57)
Using that \(\Gamma _F(f; \tilde{T} \wedge t- \mathbin {\mid }z) - \Gamma _F(f; u- \mathbin {\mid }z) = \int _{u-}^{\tilde{T} \wedge t-} \Gamma _F(f; \textrm{d} \,v \mathbin {\mid }z)\) and changing the order of integration, the expression
$$\begin{aligned} \begin{aligned} W_F''(f,g)&= \int _0^{t-} W(F) \mathbbm {1}(\tilde{T}> u) (\Gamma _F(g; u \mathbin {\mid }Z) - \frac{R(g; u; Z)}{R(F; u; Z)}) \Gamma _F(f; \textrm{d} \,u \mathbin {\mid }Z) \\&+ \int _0^{t-} W(F) \mathbbm {1}(\tilde{T} > u) (\Gamma _F(f; u \mathbin {\mid }Z) - \frac{R(f; u; Z)}{R(F; u; Z)}) \Gamma _F(g; \textrm{d} \,u \mathbin {\mid }Z) \end{aligned} \end{aligned}$$
(58)
is obtained. Also, in the same terms,
$$\begin{aligned} W_F'(f) = \int _0^{t-} W(F) \mathbbm {1}(\tilde{T} > u ) \Gamma _F(f; \textrm{d} \,u \mathbin {\mid }Z). \end{aligned}$$
(59)
This leads to the expressions
$$\begin{aligned} \theta _F'(f) =\int W(F) Y \textrm{d} \,f + \int \int _0^{t-} W(F) Y \mathbbm {1}(\tilde{T} > u) \Gamma _F(f; \textrm{d} \,u \mathbin {\mid }Z) \textrm{d} \,F \end{aligned}$$
(60)
and
$$\begin{aligned} \begin{aligned} \theta _F''(f,g)&= \int \int _0^{t-} W(F) Y \mathbbm {1}(\tilde{T}> u) \Gamma _F(g; \textrm{d} \,u \mathbin {\mid }Z) \textrm{d} \,f \\&+ \int \int _0^{t-} W(F) Y \mathbbm {1}(\tilde{T}> u) \Gamma _F(f; \textrm{d} \,u \mathbin {\mid }Z) \textrm{d} \,g \\&+ \int \int _0^{t-} W(F) Y \mathbbm {1}(\tilde{T}> u) (\Gamma _F(g; u \mathbin {\mid }Z) - \frac{R(g; u; Z)}{R(F; u; Z)}) \Gamma _F(f; \textrm{d} \,u \mathbin {\mid }Z) \textrm{d} \,F \\&+ \int \int _0^{t-} W(F) Y \mathbbm {1}(\tilde{T} > u) (\Gamma _F(f; u \mathbin {\mid }Z) - \frac{R(f; u; Z)}{R(F; u; Z)}) \Gamma _F(g; \textrm{d} \,u \mathbin {\mid }Z) \textrm{d} \,F. \end{aligned} \end{aligned}$$
(61)
To obtain the first order influence function, evaluate the first order derivative at the true \(F=F_0\) and the direction \(f = \delta _{\tilde{\mathscr{X}}} - F_0\) where \(\delta _{\tilde{\mathscr{X}}}\) corresponds to the Dirac measure at a given \(\tilde{\mathscr{X}}\). Note that (for z such that \(p_Z(z) > 0\))
$$\begin{aligned} \Gamma _{F_0}(\delta _{\tilde{\mathscr{X}}} - F_0; s \mathbin {\mid }z) = \int _0^s \frac{\mathbbm {1}(Z = z)}{\tilde{S}(u \mathbin {\mid }z) p_Z(z)} M(\tilde{\mathscr{X}}; \textrm{d} \,u \mathbin {\mid }Z). \end{aligned}$$
(62)
Since Z and z can be used interchangeably on \(Z=z\), this allows for a useful change in the order of integration such that, since
$$\begin{aligned} e^*(s \mathbin {\mid }z) = \int W(F_0) Y \frac{\mathbbm {1}(\tilde{T} > s, Z = z)}{\tilde{S}(s \mathbin {\mid }z) p_Z(z)} \textrm{d} \,F_0, \end{aligned}$$
(63)
the desired expression of the first order influence function is obtained. To obtain the second order influence function, evaluate the second order derivative at the true \(F=F_0\) and directions \(f=\delta _{\tilde{\mathscr{X}}_1} - F_0\) and \(g=\delta _{\tilde{\mathscr{X}}_2} - F_0\). Using the same arguments as for the first order influence function and with a slight elimination of terms, the desired expression is obtained.
The functional approach used here can be formalized in a p-variation setting as is done in Overgaard et al. (2019), but this does impose restrictive requirements on the outcome Y, namely bounded p-variation on the time argument of the function y from Sect. 2. The implied influence functions likely apply more generally. \(\square \)
The notation in (49) is slightly misleading since \(\dot{\Lambda }\) is not generally the exact influence function of the \(\Lambda \) estimate. The close connection of \(\dot{\Lambda }\) to the influence function can however be realized from the proof. In the continuous case there is no difference.
Recall that the estimating equation for each type is of the form \(U_{n,type }(\beta ) = 0\) for \(U_{n,type }(\beta ) = \sum _i u_{n,i,type }(\bar{\mathscr{X}}_i;\beta )\) where specifically
$$\begin{aligned} u_{n,i,\textit{ind}}(\bar{\mathscr{X}}_i;\beta )&= A(\beta ; X_i) \hat{W}_i (Y_i - \mu (\beta ; X_i)), \end{aligned}$$
(64)
$$\begin{aligned} u_{n,i,\textit{out}}(\bar{\mathscr{X}}_i;\beta )&= A(\beta ; X_i) (\hat{W}_i Y_i - \mu (\beta ; X_i)), \end{aligned}$$
(65)
$$\begin{aligned} u_{n,i,\textit{pse}}(\bar{\mathscr{X}}_i;\beta )&= A(\beta ; X_i) (\hat{\theta }_i - \mu (\beta ; X_i)). \end{aligned}$$
(66)
The next lemma concerns an approximation of these terms. In the remainder of this appendix, a set of regularity conditions are used to establish the desired properties. A sufficient set of conditions involve: The positivity requirement \(\tilde{S}(t \mathbin {\mid }Z) > 0\) (almost surely); a bounded outcome Y; two times continuous differentiability of \(\beta \mapsto A(\beta ; X)\) and \(\beta \mapsto A(\beta ; X) \mu (\beta ; X)\) (almost surely); locally dominated integrability of the second order derivatives of these functions; integrability of the first order derivative; finite second moment of \(A(\beta ; X)\) and \(A(\beta ; X) \mu (\beta ;X)\); non-singularity of relevant matrices in the following. Many of these conditions are to hold at or in an open neighborhood of the relevant \(\beta \).
Lemma 3
Under the the mentioned regularity conditions each of the three types allows for a representation
$$\begin{aligned} u_{n, i,type }(\beta ) = u_{type }(\bar{\mathscr{X}}_i; \beta ) + \frac{1}{n} \sum _{j=1}^n \dot{u}_{type }(\bar{\mathscr{X}}_i; \bar{\mathscr{X}}_j; \beta )+ R_{n, i, type }(\beta ) \end{aligned}$$
(67)
where
1.
\(\operatorname {E}(\dot{u}_{type }(\bar{\mathscr{X}}_i; \bar{\mathscr{X}}_j; \beta ) \mathbin {\mid }\bar{\mathscr{X}}_i) = 0\) (almost surely) for \(j \ne i\).
 
2.
\(\Vert R_{n,i, type }(\beta ) \Vert \le r_type (\bar{\mathscr{X}}_i;\beta ) Q_{n,type }\) for a positive and locally dominated square integrable function \(r_type (\bar{\mathscr{X}}_i; \beta )\) and a sequence \((Q_{n,type })\) such that \(Q_{n,type } = o_{\operatorname {P}}(n^{-1/2})\) as \(n \rightarrow \infty \).
 
Specifically,
$$\begin{aligned} u_\text {ind}(\bar{\mathscr{X}}; \beta )&= A(\beta ; X) W^*(Y- \mu (\beta ;X)), \\ u_\text {out}(\bar{\mathscr{X}}; \beta )&= A(\beta ; X) (W^*Y- \mu (\beta ;X)), \\ u_\text {pse}(\bar{\mathscr{X}}; \beta )&= A(\beta ; X) (W^*Y - \mu (\beta ;X) + \int _0^{t-} e^*(u \mathbin {\mid }Z) M(\tilde{\mathscr{X}}; \textrm{d} \,u \mathbin {\mid }Z)), \end{aligned}$$
and
$$\begin{aligned} \dot{u}_\text {ind}(\bar{\mathscr{X}}_1, \bar{\mathscr{X}}_2;\beta )&= A(\beta ; X_1) \int _0^{t-} W_1^*(Y_1 - \mu (\beta ; X_1)) \frac{\mathbbm {1}(\tilde{T}_1> u) \mathbbm {1}(Z_1=Z_2)}{\tilde{S}(u \mathbin {\mid }Z_2) p_Z(Z_2)} \\&M(\tilde{\mathscr{X}}_2; \textrm{d} \,u \mathbin {\mid }Z_2), \\ \dot{u}_\text {out}(\bar{\mathscr{X}}_1, \bar{\mathscr{X}}_2;\beta )&= A(\beta ; X_1) \int _0^{t-} W_1^* Y_1 \frac{\mathbbm {1}(\tilde{T}_1 > u) \mathbbm {1}(Z_1=Z_2)}{\tilde{S}(u \mathbin {\mid }Z_2) p_Z(Z_2)} M(\tilde{\mathscr{X}}_2; \textrm{d} \,u \mathbin {\mid }Z_2), \\ \dot{u}_\text {pse}(\bar{\mathscr{X}}_1, \bar{\mathscr{X}}_2;\beta )&= A(\beta ; X_1) \ddot{\theta }(\tilde{\mathscr{X}}_1, \tilde{\mathscr{X}}_2). \end{aligned}$$
Proof
For types ind and out this is an application of Lemma 1 where \(r_\textit{ind}(\bar{\mathscr{X}}; \beta ) = \Vert A(\beta ;X)(Y-\mu (\beta ;X)) \mathbbm {1}(C \ge T \wedge t)\Vert \) and \(r_\textit{out}(\bar{\mathscr{X}}; \beta ) = \Vert A(\beta ;X) Y \mathbbm {1}(C \ge T \wedge t)\Vert \) can be used and where \(Q_{n,type }\) can be the remainder from Lemma 1 with \(s=t\). Concerning the pse type, the approximation of the pseudo-observations
$$\begin{aligned} \hat{\theta }_i = \theta + \dot{\theta }(\tilde{\mathscr{X}}_i) + \frac{1}{n} \sum _{j=1}^n \ddot{\theta }(\tilde{\mathscr{X}}_i, \tilde{\mathscr{X}}_j) + o_{\operatorname {P}}(n^{-1/2}) \end{aligned}$$
(68)
is uniform in i according to Proposition 3.1 of Overgaard et al. (2017) using the functional approach of Lemma 2. Lemma 2 also gives the expression of \(\dot{\theta }\) and \(\ddot{\theta }\) used in the statement. Above, \(\theta = \operatorname {E}(W^* Y)\) is the limit of the estimator. Concretely, take \(r_\textit{pse}(\bar{\mathscr{X}}; \beta ) = \Vert A(\beta ;X) \Vert \) and let \(Q_{n,\textit{pse}}\) be the remainder from (68). The property that \(\operatorname {E}(\dot{u}_{type }(\bar{\mathscr{X}}_i; \bar{\mathscr{X}}_j; \beta ) \mathbin {\mid }\bar{\mathscr{X}}_i) = 0\) for \(j \ne i\) is essentially a result of properties of the influence functions, but can be checked using primarily that \(\operatorname {E}(M(\tilde{\mathscr{X}}; s \mathbin {\mid }Z) \mathbin {\mid }Z) = 0\) as well as independence of observations. For the pse case, it is worth noting that \(\operatorname {E}(R(\tilde{\mathscr{X}};s; z)) = \check{S}(s \mathbin {\mid }z) p_Z(z)\) and that
$$\begin{aligned} \operatorname {E}\big (W^* Y \frac{\mathbbm {1}(\tilde{T} > s) \mathbbm {1}(Z_1 = z)}{\tilde{S}(s \mathbin {\mid }z) p_Z(z)} \big ) = e^*(s \mathbin {\mid }z) \end{aligned}$$
(69)
basically by definition. \(\square \)
With the definitions of Lemma 3, let for each of the three types
$$\begin{aligned} h_{type }(\bar{\mathscr{X}}; \beta ) = u_{type }(\bar{\mathscr{X}}; \beta ) + \operatorname {E}(\dot{u}_{type }(\bar{\mathscr{X}}_1; \bar{\mathscr{X}}; \beta ) \mathbin {\mid }\bar{\mathscr{X}}). \end{aligned}$$
(70)
Lemma 4
Under the regularity conditions mentioned above there is, for each of the three types, an asymptotic equivalence of \(U_{n,type }(\beta ) = \sum _i u_{n,i,type }(\bar{\mathscr{X}}_i;\beta )\) and
$$\begin{aligned} U^*_{n,type }(\beta ) = \sum _{i=1}^n h_{type }(\bar{\mathscr{X}}_i; \beta ) \end{aligned}$$
(71)
in the sense that \(n^{-1/2}(U_{n,type }(\beta )-U^*_{n,type }(\beta )) \rightarrow 0\) in probability as \(n \rightarrow \infty \).
Proof
This is a U-statistic or rather V-statistic argument, which applies owing to the approximations of Lemma 3. The \(U_{n,type }(\beta )\) is well approximated by a V-statistic of order 2. Symmetrized, the V-statistic has the kernel function
$$\begin{aligned} \begin{aligned} k(\bar{\mathscr{X}}_1, \bar{\mathscr{X}}_2; \beta )&= \frac{1}{2} \big (u_{type }(\bar{\mathscr{X}}_1; \beta ) + u_{type }(\bar{\mathscr{X}}_2; \beta ) \\&+ \dot{u}_{type }(\bar{\mathscr{X}}_1, \bar{\mathscr{X}}_2;\beta ) + \dot{u}_{type }(\bar{\mathscr{X}}_2, \bar{\mathscr{X}}_1;\beta )\big ). \end{aligned} \end{aligned}$$
(72)
Let \(\gamma = \operatorname {E}(k(\bar{\mathscr{X}}, \bar{\mathscr{X}}_2; \beta ))\). Using the properties from Lemma 3, it can be seen that \(\operatorname {E}(\dot{u}(\bar{\mathscr{X}}_1, \bar{\mathscr{X}}_2;\beta )) = 0\), such that \(\gamma = \operatorname {E}(u_{type }(\bar{\mathscr{X}}; \beta ))\), and then
$$\begin{aligned} \begin{aligned} 2 ( \operatorname {E}(k(\bar{\mathscr{X}}, \bar{\mathscr{X}}_2; \beta ) \mathbin {\mid }\bar{\mathscr{X}}) - \gamma )&= u_{type }(\bar{\mathscr{X}}; \beta ) + \operatorname {E}(\dot{u}_{type }(\bar{\mathscr{X}}_1; \bar{\mathscr{X}}; \beta ) \mathbin {\mid }\bar{\mathscr{X}}) - \gamma \\&= h_{type }(\bar{\mathscr{X}}; \beta ) - \gamma . \end{aligned} \end{aligned}$$
(73)
Since \(k(\bar{\mathscr{X}}_1, \bar{\mathscr{X}}_2; \beta )\) will have finite second moment, as can be seen from the regularity conditions, the claim follows from results on V-statistics. See for instance Theorem 12.3 and Problem 12.10 of van der Vaart (1998). \(\square \)
In the following lemma, the regularity conditions are used to ensure, with high probability, the existence of a solution to the estimating equation in each of the three cases. This can be done since the regularity conditions imply similar properties of \(\beta \mapsto h_{type }(\bar{\mathscr{X}}; \beta )\).
Lemma 5
For any one of the three types, suppose \(\beta _{type }^*\) exists such that \(\operatorname {E}(h_{type }(\bar{\mathscr{X}}; \beta _{type }^*)) = 0\). Under the regularity conditions mentioned above, a sequence \((\hat{\beta }_{n,type })\) exists such that \(U_{n,type }(\hat{\beta }_{n,type }) = 0\) with a probability tending to 1 and \(\hat{\beta }_{n,type } \rightarrow \beta _{type }^*\) in probability, and further
$$\begin{aligned} \hat{\beta }_{n,type } - \beta _{type }^* = \frac{1}{n} \sum _{i=1}^n \dot{\beta }_{type }(\bar{\mathscr{X}}_i) + o_{\operatorname {P}}(n^{-\frac{1}{2}}) \end{aligned}$$
(74)
where
$$\begin{aligned} \dot{\beta }_{type }(\bar{\mathscr{X}}) = - J_{type }(\beta _{type }^*)^{-1} h_{type }(\bar{\mathscr{X}}; \beta _{type }^*) \end{aligned}$$
(75)
and
$$\begin{aligned} J_{type }(\beta ) = \operatorname {E}(\frac{\partial }{\partial \beta ^{\textsf{T}}} h_{type }(\bar{\mathscr{X}}; \beta )). \end{aligned}$$
(76)
Proof
On the basis of Lemma 4, it is possible to appeal to Theorem 5.41 and Theorem 5.42 of van der Vaart (1998) for this result. Strictly speaking, Theorem 5.42 will ensure the existence of a solution to \(U_{n,type }^*(\beta ) = 0\) with high probability for large n, but an inspection of its proof will reveal that the same applies to a solution to \(U_{n,type }(\beta ) = 0\) owing to the close approximation to the V-statistic considered in the proof of Lemma 4, for which a uniform law of large numbers applies, and how well behaved the \(r_{type }(\bar{\mathscr{X}}_i; \beta )\) of the remainders are. \(\square \)
The notation
$$\begin{aligned} u({\mathscr{X}};\beta ) = A(\beta ; X)(Y-\mu (\beta ;X)) \end{aligned}$$
(77)
relating to the estimating equation of the uncensored problem is used in the following.
Proposition 1
In the setting of the previous lemma, suppose \(\beta ^*\) is a solution to the uncensored problem, \(\operatorname {E}(u({\mathscr{X}}; \beta )) = 0\) and \(\beta _{type }^*\) exists such that \(\operatorname {E}(h_{type }(\bar{\mathscr{X}}; \beta _{type }^*)) = 0\) for one of the three types. Assuming invertibility of \(J_{type }\) at \(\beta ^*\), a first order approximation of the bias is
$$\begin{aligned} \begin{aligned} \beta ^*_{type } - \beta ^*&\approx -J_{type }(\beta ^*)^{-1} \operatorname {E}(h_{type }(\bar{\mathscr{X}}; \beta ^*)) \\&= \operatorname {E}(\int _0^{t-} \psi _{type }({\mathscr{X}}; s; \beta ^*) \frac{1}{G^*(s \mathbin {\mid }Z)} M(\tilde{\mathscr{X}}; \textrm{d} \,s \mathbin {\mid }Z) ) \end{aligned} \end{aligned}$$
(78)
where, with the notation \(B_{type }(\beta ; X) = J_type (\beta )^{-1} A(\beta ; X)\) and \(W^*(s) = \frac{\mathbbm {1}(C \ge T \wedge t)}{G^*(T \wedge t \mathbin {\mid }Z)/G^*(s \mathbin {\mid }Z)}\),
$$\begin{aligned} \psi _\textit{ind}({\mathscr{X}}; s; \beta )&= B_{\textit{ind}}(\beta ; X)(Y - \mu (\beta ; X)), \end{aligned}$$
(79)
$$\begin{aligned} \psi _\textit{out}({\mathscr{X}}; s; \beta )&= B_{\textit{out}}(\beta ; X)Y, \end{aligned}$$
(80)
$$\begin{aligned} \psi _\textit{pse}({\mathscr{X}}; s; \beta )&= B_{\textit{pse}}(\beta ; X)(Y - \operatorname {E}(W^*(s) Y \mathbin {\mid }\tilde{T} > s, Z)). \end{aligned}$$
(81)
Under an assumption of https://static-content.springer.com/image/art%3A10.1007%2Fs10985-025-09669-8/MediaObjects/10985_2025_9669_Fige_HTML.gif , this leads to
$$\begin{aligned} \begin{aligned}&\beta ^*_{type } - \beta ^* \approx \operatorname {E}\big (\int _0^{t-} \operatorname {E}(\psi _{type }({\mathscr{X}}; s; \beta ^*) \mathbin {\mid }T > s, X) S(s \mathbin {\mid }X) \\ &\frac{G(s- \mathbin {\mid }X)}{G^*(s \mathbin {\mid }Z)} (\Lambda (\textrm{d} \,s \mathbin {\mid }X) - \Lambda ^*(\textrm{d} \,s \mathbin {\mid }Z))\big ). \end{aligned} \end{aligned}$$
(82)
Proof
This result is based on a Taylor approximation of \(\beta \mapsto \operatorname {E}(h_{type }(\bar{\mathscr{X}}; \beta ))\) around \(\beta ^*\) which would reveal
$$\begin{aligned} 0 = \operatorname {E}(h_{type }(\bar{\mathscr{X}}; \beta ^*_{type })) = \operatorname {E}(h_{type }(\bar{\mathscr{X}}; \beta ^*)) + J_{type }(\beta ^*) (\beta _{type }^* - \beta ^*) + Rem . \end{aligned}$$
(83)
and thereby the stated first order approximation. Here, \(\operatorname {E}(h_{type }(\bar{\mathscr{X}}; \beta ^*)) = \operatorname {E}(u_{type }(\bar{\mathscr{X}}; \beta ^*) - u({\mathscr{X}}; \beta ^*))\) since the higher order term has mean 0 and using that \(\beta ^*\) is a solution to \(\operatorname {E}(u({\mathscr{X}}; \beta )) = 0\). The equality \(W^*-1 = - \int _0^{T \wedge t-} \frac{1}{G^*(s \mathbin {\mid }Z)} M(\tilde{\mathscr{X}}; \textrm{d} \,s \mathbin {\mid }Z)\) now helps to establish the second expression of the approximation. The conditional independence assumption makes it possible to first take the expectation given (TDX), which helps to establish the structure of the integrator in the display, and then conditional expectation given X only. \(\square \)
Lemma 6
Consider the setting of Lemma 5. Under the independent censoring assumption, Assumption 1, each of the three types allows for the representation
$$\begin{aligned} h_type (\bar{\mathscr{X}};\beta ) = u({\mathscr{X}};\beta ) - \int _0^{t-} \frac{v_type ({\mathscr{X}}; s; \beta ) - e_type (\beta ; s \mathbin {\mid }Z)}{G(s \mathbin {\mid }Z)} M(\tilde{\mathscr{X}}; \textrm{d} \,s \mathbin {\mid }Z) \end{aligned}$$
(84)
where
$$\begin{aligned} v_\text {ind}({\mathscr{X}};s;\beta )&= A(\beta ; X)(Y - \mu (\beta ; X)), \end{aligned}$$
(85)
$$\begin{aligned} v_\text {out}({\mathscr{X}};s;\beta )&= A(\beta ; X)Y, \end{aligned}$$
(86)
$$\begin{aligned} v_\text {pse}({\mathscr{X}};s;\beta )&= A(\beta ; X)(Y - \operatorname {E}(Y \mathbin {\mid }T > s, Z)), \end{aligned}$$
(87)
and where \(e_type (\beta ; s\mathbin {\mid }z) = \operatorname {E}(v_type ({\mathscr{X}}; s; \beta ) \mathbin {\mid }T > s, Z=z)\). Additionally, under Assumption 1,
$$\begin{aligned} J_{type }(\beta ) = \operatorname {E}\big ( \frac{\partial }{\partial \beta ^{\textsf{T}}} A(\beta ; X) (Y- \mu (\beta ; X)) - A(\beta ; X) \frac{\partial }{\partial \beta ^{\textsf{T}}} \mu (\beta ; X) \big ) \end{aligned}$$
(88)
for each of the three types, and so, \(J_{type }(\beta )\) does not depend on the type.
Proof
It can be seen that
$$\begin{aligned} W^* - 1 = - \int _0^{t-} \frac{1}{G(s \mathbin {\mid }Z)} M(\tilde{\mathscr{X}}; \textrm{d} \,s \mathbin {\mid }Z) \end{aligned}$$
(89)
and
$$\begin{aligned} e^*(s \mathbin {\mid }Z) = \frac{\operatorname {E}(Y \mathbin {\mid }T > s, Z)}{G(s \mathbin {\mid }Z)} \end{aligned}$$
(90)
under the assumption. This reveals how the structure concerning \(u({\mathscr{X}};\beta )\) and \(v_type ({\mathscr{X}};s;\beta )\) comes from \(u_{type }(\bar{\mathscr{X}}; \beta )\) under the assumption. The remaining part concerning \(e_type (\beta ; s \mathbin {\mid }Z)\), on the other hand, comes from \(\operatorname {E}(\dot{u}_{type }(\bar{\mathscr{X}}_1; \bar{\mathscr{X}}; \beta ) \mathbin {\mid }\bar{\mathscr{X}})\). For types ind and out, the structure follows since, for any suitable function f of the underlying information \({\mathscr{X}}\),
$$\begin{aligned} \begin{aligned}&\operatorname {E}\big ( f({\mathscr{X}}) W \frac{\mathbbm {1}(\tilde{T}> s) \mathbbm {1}(Z = z)}{\tilde{S}(s \mathbin {\mid }z) p_Z(z)} \big ) \\&= \operatorname {E}\big (f({\mathscr{X}}) \frac{\mathbbm {1}(C \ge T \wedge t)}{G(T \wedge t - \mathbin {\mid }Z)} \mathbin {\mid }\tilde{T}> s, Z=z \big ) \\&= \operatorname {E}\big (f({\mathscr{X}}) \frac{\operatorname {P}(C \ge T \wedge t \mathbin {\mid }C> s, Z=z, T)}{G(T \wedge t - \mathbin {\mid }Z)} \mathbin {\mid }\tilde{T}> s, Z=z \big ) \\&= \frac{\operatorname {E}(f({\mathscr{X}}) \mathbin {\mid }\tilde{T}> s, Z=z)}{G(s \mathbin {\mid }z)} \\&= \frac{\operatorname {E}(f({\mathscr{X}}) \mathbin {\mid }T > s, Z=z)}{G(s \mathbin {\mid }z)}, \end{aligned} \end{aligned}$$
(91)
where the independent censoring assumption is used in the last two equalities. To handle the pse case, also note that \(\operatorname {E}(M(\tilde{\mathscr{X}};s \mathbin {\mid }z) \mathbin {\mid }{\mathscr{X}}) = 0\) under the assumption, and thus \(\operatorname {E}(\dot{\Lambda }(\tilde{\mathscr{X}};s \mathbin {\mid }z) \mathbin {\mid }{\mathscr{X}}) = 0\) as well. This takes care of most of the terms from \(\ddot{\theta }\) in \(\dot{u}_\textit{pse}\). One remaining term, involving \(A(\beta _0; X) W Y\), follows the structure from above. For the last remaining term, note that
$$\begin{aligned} \begin{aligned}&\frac{\operatorname {E}(A(\beta _0;X) e^*(s \mathbin {\mid }z) R(\tilde{\mathscr{X}};s;z))}{\check{S}(s \mathbin {\mid }z) p_Z(z)} \\&= \frac{\operatorname {E}(A(\beta _0;X) e^*(s \mathbin {\mid }z) \mathbbm {1}(T> s) \mathbbm {1}(Z=z) \operatorname {P}(C \ge s \mathbin {\mid }{\mathscr{X}}))}{S(s \mathbin {\mid }z) G(s- \mathbin {\mid }z) p_Z(z)} \\&= \operatorname {E}(A(\beta _0; X) e^*(s \mathbin {\mid }z) \mathbin {\mid }T> s, Z=z) \\&= \frac{\operatorname {E}(A(\beta _0; X) \operatorname {E}(Y \mathbin {\mid }T> s, Z=z) \mathbin {\mid }T > s, Z=z)}{G(s \mathbin {\mid }z)} \end{aligned} \end{aligned}$$
(92)
since \(\operatorname {P}(C \ge s \mathbin {\mid }{\mathscr{X}}) = G(s- \mathbin {\mid }z)\) on \(Z=z\) under the independent censoring assumption. The structure of (84) and the fact that \(\operatorname {E}(M(\tilde{\mathscr{X}};s \mathbin {\mid }z) \mathbin {\mid }{\mathscr{X}}) = 0\) reveals that
$$\begin{aligned} J_{type }(\beta ) = \operatorname {E}(\frac{\partial }{\partial \beta } h_{type }(\bar{\mathscr{X}}; \beta )) = \operatorname {E}(\frac{\partial }{\partial \beta } u({\mathscr{X}}; \beta )) \end{aligned}$$
(93)
which will have the desired expression and does not depend on the type of approach. \(\square \)
It may also be noted that, in Lemma 6, if the model \(\operatorname {E}(Y \mathbin {\mid }X) = \mu (\beta _0;X)\) holds, then, at the true \(\beta = \beta _0\),
$$\begin{aligned} J_{type }(\beta ) = - \operatorname {E}\big (A(\beta ; X) \frac{\partial }{\partial \beta } \mu (\beta ; X) \big ). \end{aligned}$$
(94)
The proofs of Theorem 1 and Theorem 2 are now presented.
Proof of Theorem 1
The true \(\beta _0\) becomes the limiting solution and the result can be established by appealing to Lemmas 5 and 6. Note how \(M(\tilde{\mathscr{X}}; s \mathbin {\mid }z)\) is \(\int _0^s \mathbbm {1}(T > u) M(\textrm{d} \,u \mathbin {\mid }z)\) under the assumptions in the notation of the theorem. Under Assumption 1, and using Lemma 6, this implies that \(\beta _0\) is in fact solving \(\operatorname {E}(h_{type }(\bar{\mathscr{X}}; \beta _0)) = 0\) for each type since \(\operatorname {E}(u({\mathscr{X}};\beta _0)) = 0\) and \(\operatorname {E}(M(\tilde{\mathscr{X}}; s \mathbin {\mid }Z) \mathbin {\mid }{\mathscr{X}}) = 0\) for all s. Also, the matrix \(J_{type }(\beta _0)\) reduces to the relevant \(J(\beta _0)\) under the model assumption \(\operatorname {E}(Y \mathbin {\mid }X) = \mu (\beta _0; X)\) as just noted. Using Lemma 5 now gives the desired result and also the expression of \(\dot{\beta }_{type }\) via (84). \(\square \)
Proof of Theorem 2
The \(\frac{1}{n}\frac{\partial }{\partial \beta } U_{n,type }(\beta )\), if evaluated at the true \(\beta _0\), converge to \(J(\beta _0)\) under the assumptions. An application of Lemma 3 will give a close approximation of \(\frac{1}{n}\sum _{i=1}^n u_{i,n,type }(\beta ) u_{i,n,type }(\beta )^{\textsf{T}}\) to a U-statistic of order 3 which will, according to the law of large numbers for U-statistics converge to its mean. That mean is \(\operatorname {E}(u_{type }(\bar{\mathscr{X}}; \beta )^{\otimes 2})\). Following the approach of the proofs of Lemma 6 and Corollary 1 will lead to the desired expression at the true \(\beta _0\). Under the mentioned regularity conditions, which allow for the mentioned convergences to be uniform in an open neighborhood of the true \(\beta _0\), evaluating at estimates that converge to the true \(\beta _0\) will yield the same limit as at the true \(\beta _0\). The difference
$$\begin{aligned} \Sigma _{type }' - \Sigma _{type } = \operatorname {E}(\int _0^{t-} \operatorname {E}(\phi _{type }(s; T, D, X) \mathbin {\mid }T > s, Z=z)^{\otimes 2} \frac{S(s \mathbin {\mid }Z)}{G(s \mathbin {\mid }Z)} \Lambda (\textrm{d} \,s \mathbin {\mid }Z)) \end{aligned}$$
(95)
is non-negative definite necessarily and positive definite unless \(\operatorname {E}(\phi _{type }(s; T, D, X) \mathbin {\mid }T > s, Z=z) = 0\) for \(\Lambda (\cdot \mathbin {\mid }z)\)-almost all s for almost all z as claimed since \(S(t \mathbin {\mid }Z) > 0\) almost surely by assumption. \(\square \)
go back to reference Andersen PK, Pohar Perme M (2010) Pseudo-observations in survival analysis. Stat Methods Med Res 19(1):71–99. https://doi.org/10.1177/0962280209105020MathSciNetCrossRef
go back to reference Andersen PK, Borgan Ø, Gill RD, Keiding N (1993) Statistical models based on counting processes. Springer series in statistics. Springer, New York. https://doi.org/10.1007/978-1-4612-4348-9CrossRef
go back to reference Andersen PK, Klein JP, Rosthøj S (2003) Generalised linear models for correlated pseudo-observations, with applications to multi-state models. Biometrika 90(1):15–27. https://doi.org/10.1093/biomet/90.1.15MathSciNetCrossRef
go back to reference Bang H, Tsiatis A (2000) Estimating medical costs with censored data. Biometrika 87(2):329–343. https://doi.org/10.1093/biomet/87.2.329MathSciNetCrossRef
go back to reference Binder N, Gerds TA, Andersen PK (2014) Pseudo-observations for competing risks with covariate dependent censoring. Lifetime Data Anal 20(2):303–315. https://doi.org/10.1007/s10985-013-9247-7MathSciNetCrossRef
go back to reference Blanche PF, Holt A, Scheike T (2023) On logistic regression with right censored data, with or without competing risks, and its use for estimating treatment effects. Lifetime Data Anal 29(2):441–482. https://doi.org/10.1007/s10985-022-09564-6MathSciNetCrossRef
go back to reference Martinussen T, Scheike TH (2023) Efficient \(t_0\)-year risk regression using the logistic model. Scand J Stat 50(4):1919–1932. https://doi.org/10.1111/sjos.12658MathSciNetCrossRef
go back to reference Overgaard M, Parner ET, Pedersen J (2017) Asymptotic theory of generalized estimating equations based on jack-knife pseudo-observations. Ann Statist 45(5):1988–2015. https://doi.org/10.1214/16-AOS1516MathSciNetCrossRef
go back to reference Overgaard M, Parner ET, Pedersen J (2018) Estimating the variance in a pseudo-observation scheme with competing risks. Scand J Stat 45(4):923–940. https://doi.org/10.1111/sjos.12328MathSciNetCrossRef
go back to reference Overgaard M, Parner ET, Pedersen J (2019) Pseudo-observations under covariate-dependent censoring. J Stat Plan Inference 202:112–122. https://doi.org/10.1016/j.jspi.2019.02.003MathSciNetCrossRef
go back to reference Parner ET, Andersen PK, Overgaard M (2023) Regression models for censored time-to-event data using infinitesimal jack-knife pseudo-observations, with applications to left-truncation. Lifetime Data Anal 29(3):654–671. https://doi.org/10.1007/s10985-023-09597-5MathSciNetCrossRef
go back to reference Robins JM, Rotnitzky A (1992) Recovery of information and adjustment for dependent censoring using surrogate markers. In: Jewell NP, Dietz K, Farewell VT (eds) AIDS epidemiology: methodological issues. Birkhäuser Boston, Boston, pp 297–331CrossRef
go back to reference Sandqvist OL (2024) Doubly robust inference with censoring unbiased transformations. arXiv preprint arXiv:2411.04909
go back to reference Satten GA, Datta S (2001) The Kaplan–Meier estimator as an inverse-probability-of-censoring weighted average. Am Stat 55(3):207–210. https://doi.org/10.1198/000313001317098185MathSciNetCrossRef
go back to reference Scheike TH, Zhang MJ, Gerds TA (2008) Predicting cumulative incidence probability by direct binomial regression. Biometrika 95(1):205–220. https://doi.org/10.1093/biomet/asm096MathSciNetCrossRef
go back to reference Stute W, Wang JL (1994) The jackknife estimate of a Kaplan–Meier integral. Biometrika 81(3):602–606. https://doi.org/10.1093/biomet/81.3.602MathSciNetCrossRef
go back to reference van der Vaart AW (1998) Asymptotic statistics. Cambridge series in statistical and probabilistic mathematics. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511802256CrossRef
Image Credits
Salesforce.com Germany GmbH/© Salesforce.com Germany GmbH, IDW Verlag GmbH/© IDW Verlag GmbH, Diebold Nixdorf/© Diebold Nixdorf, Ratiodata SE/© Ratiodata SE, msg for banking ag/© msg for banking ag, C.H. Beck oHG/© C.H. Beck oHG, Governikus GmbH & Co. KG/© Governikus GmbH & Co. KG, Horn & Company GmbH/© Horn & Company GmbH, EURO Kartensysteme GmbH/© EURO Kartensysteme GmbH, Jabatix S.A./© Jabatix S.A.