Finds documents with both search terms in any word order, permitting "n" words as a maximum distance between them. Best choose between 15 and 30 (e.g. NEAR(recruit, professionals, 20)).
Finds documents with the search term in word versions or composites. The asterisk * marks whether you wish them BEFORE, BEHIND, or BEFORE and BEHIND the search term (e.g. lightweight*, *lightweight, *lightweight*).
Activate our intelligent search to find suitable subject content or patents.
Select sections of text to find matching patents with Artificial Intelligence.
powered by
Select sections of text to find additional relevant content using AI-assisted search.
powered by
(Link opens in a new window)
Abstract
This article delves into the estimation of treatment effects on duration outcomes within specific patient subgroups, focusing on the impact of intensive follow-up on cancer recurrence detection and duration. The study introduces a principal stratification framework to identify and estimate treatment effects within the subgroup of patients who would experience a positive duration under one treatment. It demonstrates how this causal effect can be identified from observational data under a monotonicity assumption and introduces a sensitivity parameter to evaluate the impact of potential violations of this assumption. The article also illustrates how the causal effect can be estimated using multi-state models in conjunction with pseudo-observations to handle censoring. Through simulations and real-world data examples, the study shows that hypothesis testing based on the causal effect in the principal stratum offers greater statistical power than comparisons based on the overall treated and control groups. The findings highlight the importance of quantifying treatment effects within specific subgroups of patients to better understand the benefits of interventions.
AI Generated
This summary of the content was generated with the help of AI.
Abstract
In clinical research, estimating the average treatment effect is a common goal. However, when treatment effects vary substantially across individuals, it is often more informative to evaluate the treatment effect within subgroups. This paper focuses on causal inference for a duration outcome in a principal stratum—defined as the subgroup of individuals who would experience a positive duration under one treatment. Motivated by the Danish Vulva Cancer Recurrence Study (DaVulvaRec), which compares intensive versus standard follow-up in women treated for vulvar cancer, we examine the effect of intensive follow-up on the time with a cancer recurrence diagnosis. The principal stratum is in this example women who would be diagnosed with cancer recurrence under the intensive follow-up. We present a framework for identifying and estimating the average treatment effect in the principal stratum under a monotonicity assumption and introduce a sensitivity parameter to evaluate the impact of potential violations of this assumption. Using a multi-state model with pseudo-observations, we account for censoring and demonstrate that this approach offers greater statistical power than conventional comparisons between treatment groups. We illustrate the methodology to sample size calculation, the final analysis of the DaVulvaRec study using a simulated data set and an application to data from a randomized study on colon cancer.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
Clinical studies often aim to estimate the average treatment effect. However, when the effect of a treatment varies among individuals, conditional estimates may become more relevant (Hauck et al. 1998; Harrell 2021). In some cases, an intervention is specifically designed to benefit a particular subgroup of patients. For example, the current paper was motivated by the planning of a clinical study investigating the effect of intensive follow-up compared to standard follow-up for women treated for vulvar cancer, the Danish Vulva Cancer Recurrence Study (DaVulvaRec), registered at https://clinicaltrials.gov under NCT06495554. The primary outcome was the diagnosis of cancer recurrence within two years of follow-up. The hypothesis was that intensive follow-up, which included regular blood tests and symptom questionnaires, would result in more women being diagnosed with recurrence and that the diagnosis would occur earlier. At the start of the study, all participants have the potential to benefit from the intervention, making the average treatment effect—such as the overall increase in recurrence diagnoses—an initially relevant measure.
However, the direct benefit of the intervention applies only to women who actually experience a recurrence. For these patients, it is also important to quantify the treatment effect within the subgroup eligible for direct benefit. A key outcome for this group is the duration of time living with a recurrence diagnosis, conceptualized as an extended treatment window.
Advertisement
Estimating average treatment effects within covariate-defined subgroups is common in the causal inference literature (Hernán and Robins 2020), and these are referred to as conditional average treatment effects. When subgroups are defined based on variables observed after the start of the clinical study, the framework of principal stratification applies (Frangakis and Rubin 2002). In this context, such subgroups are referred to as principal strata. The ICH E9 (R1) addendum proposed principal stratification as one of five strategies for dealing with intercurrent events after the start of the clinical study (ICH E9 (R1) 2020).
In this work, we define principal strata based on potential outcomes in one of the treatment groups. Specifically, we focus on the average treatment effect for a duration outcome in the principal stratum of individuals who would experience a positive duration in the the treatment group with the highest event rate. We show that this causal effect can be identified from observational data under a monotonicity assumption and introduce a sensitivity parameter to evaluate the impact of potential violations of this assumption. In the DaVulvaRec study, the causal effect of interest pertains to patients diagnosed with recurrence under intensive follow-up. The monotonicity assumption posits that any patient diagnosed under standard follow-up would also be diagnosed under intensive follow-up. To address censoring, we adopt a multi-state modeling framework and employ censored multi-state models with pseudo-observations. The proposed methodology is illustrated through a sample size calculation and a simulated final analysis for the DaVulvaRec study, as well as an example using data from a randomized trial on colon cancer.
2 Method
Let T denote the time from an event of interest (e.g., diagnosis of disease) to either the end of follow-up at time \(\tau\) or death, whichever occurs first. If the event does not occur, we set \(T=0\). Define \(D=1(T>0)\) as as an indicator of the event having occurred and let Z indicate treatment assignment, with \(Z=1\) for treatment group and \(Z=0\) for control group. We consider a random sample of observations from each group, denoted as \((D_1,T_1,Z_1),\ldots ,(D_n,T_n,Z_n)\). One parameter of interest relates to the number of individuals with the event of interest,
where \(n_j\), \(j=0,1\), represents the number of observations in each of the two groups.
In the context of the vulvar cancer study, D indicates a diagnosis of cancer recurrence, and T represents the duration from diagnosis until either two years (\(\tau =2\)) or death, whichever occurs first. The treatment group receives intensive follow-up, while the control group receives standard follow-up. The effect of treatment within the study timeframe includes several components: (1) patients who would have been diagnosed with recurrence under standard care within two years may experience an earlier diagnosis under intensive follow-up; (2) patients who would not have been diagnosed within two years under standard care may receive a recurrence diagnosis during the same period with intensive follow-up; and (3) patients who would have died without a recurrence diagnosis under standard care may be diagnosed with recurrence before death under intensive follow-up. Thus, the duration during which a patient lives with a recurrence diagnosis (T), capturing all three components. Intensive follow-up is expected to improve outcomes for patients with recurrence. While it may not be possible to identify all patients with recurrence at two years, we can identify those diagnosed with recurrence in the intensive follow-up group and compare their outcomes to what would have occurred under standard follow-up. This type of comparison is possible in a counterfactual framework using potential outcomes. The principal strata of patients with a diagnosis of recurrence under the intensive follow-up is largest group of patients that we can identify that would benefit from the intervention.
Advertisement
Define (D(z), T(z)) as the potential outcomes under treatment \(Z=z\). We assume consistency (\(T(z)=T\) when \(Z=z\)) and exchangeability
, which typically holds in randomized trials. The causal estimand of interest is
The subgroup defined by \(D(1)=1\) represents a principal stratum in the sense of Frangakis and Rubin (2002). This causal effect is identifiable under a monotonicity assumption: any patient who would experience the event under control would also experience it under treatment. This is formalized in Proposition 1 with the proof given in the appendix.
Since \(\gamma \le 1\), the expression in Eq. (2) generally provides a lower bound for the target parameter whenever \(\text {E}(T|Z=1)\ge \text {E}(T|Z=0)\). In this representation, the term \(\text {E}(T|Z=1)-\text {E}(T|Z=0)\) corresponds to the average treatment effect on duration for the overall population, while the factor \(1/\text {P}(D=1|Z=1)\) rescales this effect to reflect the subpopulation of interest.
Although monotonicity cannot be tested directly, it entails that the event rate of positive duration is higher in the treatment group than in the control group:
a condition that can be empirically evaluated. On this basis, the principal stratum is identified as the group with the higher event rate. In settings where \(Z=1\) denotes a new treatment and \(Z=0\) corresponds to a standard or no treatment, and where the reverse monotonicity assumption holds (i.e., \(D(1)=1\) always implies \(D(0)=1\)), analogous expressions to Eqs. (1) and (2) applies, with the roles of 0 and 1 reversed.
In general, the parameter \(\gamma\) is not identifiable from the observed data and is therefore introduced as a sensitivity parameter to assess the impact of deviations from monotonicity. One approach is to impose assumptions directly on \(\gamma\). Alternatively, \(\gamma\) can be expressed as
The first ratio reflects the relative mean duration under control between individuals with positive duration in the treatment group versus those with positive duration in the control group; this component is not identifiable from the observed data. The second ratio, in contrast, captures the relative frequency of positive duration in the treatment group compared to the control group, and is identifiable. Thus, another strategy is to place assumptions specifically on the first ratio in Eq. (3).
In the absence of censoring, the estimate for \(\alpha _2\) under monotonicity becomes
Estimation and inference can proceed via generalized estimating equations (GEE) for the outcomes \((D_{i},T_{i})\), treating \(Z_i\) as covariate and using robust standard errors to account for correlation between \(D_{i}\) and \(T_{i}\) (Liang and Zeger 1986). The estimate and variance of \(\hat{\alpha }_2\) are subsequently obtained using the delta method.
Then \(\alpha _1=\beta _3-\beta _1\) and, assuming monotonicity, \(\alpha _2=\beta _{3}^{-1}(\beta _{4}-\beta _{2})\). It is shown in the appendix that the asymptotic variance of \(\hat{\alpha }_2\) is of the form
We will revisit sample size calculation later; for now, it is sufficient to specify the values of \(\alpha _2\), \(\sigma _0\), and \(\sigma _1\), and the sample size calculation is similar to a two-sample mean comparison with unequal standard deviations.
When \(\text {E}(T|Z=1)> \text {E}(T|Z=0)\), the second term in Eq. (5) becomes negative, implying that the variance of \(\hat{\alpha }_2\) is strictly smaller than that of \(\hat{\alpha }_2^K\). Consequently, hypothesis tests based on \(\hat{\alpha }_2\) achieve greater statistical power. The inequality \(\text {E}(T|Z=1)> \text {E}(T|Z=0)\) is guaranteed under the stronger monotonicity condition \(T(1)> T(0)\).
2.1 The extended illness-death model
It is often useful to model the distribution of T within a multi-state framework, which naturally accommodates censoring. For the vulval cancer data, we adopt an extended illness-death model, as illustrated in Fig. 1. In this model, the quantity T represents the duration of time spent in state 2: Alive with a diagnosis of disease.
Let X(t) denote the state occupied by a participant at time t and define the transition probabilities as
where \(h,j\in {{\mathcal {S}}}\), the set of all possible states, and \({{\mathcal {F}}}_{s-}=\sigma (X(u), u<s)\) is the past history of the multi-state process X. The state occupation probabilities are given by
where Q(0) is the distribution of the initial state. In the vulval cancer application, the state space is \({{\mathcal {S}}}=\{ 1,2,3,4\}\), where 1: Alive without a diagnosis of the disease; 2: Alive with a diagnosis of the disease; 3: Death without a diagnosis of the disease; 4: Death with a diagnosis of the disease. Here all women are in the same state (1) at time 0, i.e. \(Q_0(0)=1\). Transition intensities are defined as
The process is Markovian if the intensities depend only on the current state h at time t, and any time-fixed covariates, but not the full past history. In the extended illness-death model, the Markov property holds provided that the death rate (intensity for the transition \(2\rightarrow 4\)) does not depend on the time of diagnosis of disease (entry into state 2). In practice, however, this assumption is often implausible. Importantly, the proposed approach does not rely on the Markov property.
The \(\beta\) parameter vector can now be written in terms of the multi-state process as
We assume that we observe independent, possibly right-censored realizations of the multi-state process \(X(\cdot )\). The observed data can be represented using counting processes \(N_{hji}(\cdot )\) for each individual i, where \(N_{hji}(t)\) counts the number of direct \(h\rightarrow j\) transitions up to time t. The observation window is restricted to \(t\le \tau _i\), the minimum of the time to absorption or censoring.
A non-parametric estimator for \(\text {P}(0,t)\), assuming a homogeneous group, is obtained by plug-in of the Nelson-Aalen estimator
and \(Y_{hi}(u)=1(X_{i}(u-)=h)\) indicates whether individual i is in state h at time u. The resulting estimator of the transition probability matrix is:
Full size image
known as the Aalen–Johansen estimator (Aalen and Johansen 1978), where
Full size image
denote the product integration. The estimated state occupation probabilities are given by the row vectors
Full size image
where \({\widehat{Q}}(0)\) is the empirical distribution of the initial state. Since \(\widehat{\varLambda }\) has jumps only at observed transition times, the product integral reduces to the ordinary matrix product
where u corresponds to transition times. The expected time spent in state j up to time t, often referred to as the expected length of stay, is estimated as
In the absence of censoring, the Aalen-Johansen-based estimate of the multi-state parameter \(\alpha _2\) reduces to the corresponding uncensored estimate in Eq. (4).
Variance estimation for the Aalen–Johansen estimator of \(\beta\), without assuming the Markov property, can be derived from its influence function (Glidden 2002). However, a more practical approach involves the use of pseudo-observations (Andersen et al. 2003), which serve as indirect estimates of the influence functions (Parner et al. 2023).
The pseudo-observation method is a flexible way to perform regression analysis for censored event data. The method relies on a well-defined estimator for the quantity of interest, \(\theta =\textrm{E}(V)\). Let \(\hat{\theta }\) represent the estimate of \(\theta\) based on the full sample \(X_1,\ldots , X_n\) and let \(\hat{\theta }^{(i)}\) denote the corresponding estimate obtained by leaving out observation, \(X_i\), i.e., from the sample \(X_1,\ldots ,X_{i-1},\)\(X_{i+1},\ldots ,X_{n}\). Here \(X_i\) denote the time-to-event data on subject i. The jack-knife pseudo-observation for the i-th observation is defined as
Assume a regression model \(\textrm{E}(V_i|Z_i)=\mu (\beta _0;Z_i)\), where \(\mu (\beta ; Z_i) = \mu (\beta ^T Z_i)\) is typically the inverse of the link function in a generalized linear model. Let \(A(\beta ;Z_i)\) be a vector function depending only on the regression parameters and covariates. Estimates of \(\beta _0\) are then obtained based on \(\hat{\theta }_{1},\ldots , \hat{\theta }_{n}\) by solving an estimating equation
This equation corresponds to a generalized linear model where the pseudo-observation \(\hat{\theta }_{i}\) replaces the potentially unobserved \(V_i\). Pseudo-observations based on the Aalen–Johansen estimator have been shown to provide unbiased estimates when the censoring time is independent of both the multi-state process and the covariates (Overgaard et al. 2023). An alternative approach is given by the infinitesimal pseudo-observations, defined as \(\phi (F_n)+\dot{\phi }_{F_n}(X_i)\), where \(\phi (F_n)\) denotes the pooled estimator and \(\dot{\phi }_{F_n}(\cdot )\) the estimated influence function. These infinitesimal pseudo-observations share the same properties as the jackknife pseudo-observations (Parner et al. 2023). The infinitesimal pseudo-observations is the version implemented in the survival package in R. The variance of the infinitesimal pseudo-observations is an estimate of the variance in Glidden (2002). Pseudo-observations may be computed jointly across comparison groups or separately within each group. In the simulations and data applications, the pseudo-observations were computed separately within each group.
Let \(\hat{\theta }_{ij}\) denote the bivariate pseudo-observation for individual i within group j for
We use the covariance of the pseudo-observations to estimate the covariance \((\hat{\beta }_1,\hat{\beta }_2)\) in group 0 and \((\hat{\beta }_3,\hat{\beta }_4)\) in group 1. The asymptotic variance of \(\hat{\alpha }_2\) again follows from the delta method, given by the form \(\sigma _0^2/n_0+\sigma _1^2/n_1\), where \(\sigma _0, \sigma _1\) are functions of the covariance of \(\hat{\theta }_{ij}\), \(\varSigma _j\) say. Further details are provided in the appendix.
3 Simulations
We investigate three simulation scenarios to assess the performance of the proposed estimator: (1) The small-sample properties of the proposed estimator; (2) The power of the proposed estimator in comparison to the approach of comparing the average time with the disease among all individuals; (3) Investigate the efficiency of the suggested estimator.
Scenario 1. We consider an extended illness-death model where all transitions follow a time-homogeneous Markov process, assuming constant hazard rates that are independent of event history. For the control group (standard follow-up), we use
The recurrence rate is \(\lambda _{12}(t|Z=0)=\lambda _{12}=0.116\) per year,
The mortality rate without recurrence is \(\lambda _{13}(t|Z=0)=\lambda _{13}=0.027\) per year,
The mortality with recurrence is \(\lambda _{24}(t|Z=0)=5\cdot \lambda _{13}(t|Z=0)\).
These rates correspond to a 2-year incidence of recurrence of 20% and a 2-year incidence of recurrence or death of 25%. These values were based on prior findings for vulvar cancer (Zach et al. 2021). Under this scenario, the \((\beta _1,\beta _2)=(20\%,2.3 \text { months})\).
For the treated group (intensive follow-up), we assume accelerated time-to-disease with factor \(a=50\%\). Thus, the disease rate becomes: \(\lambda _{12}(t|Z=1)=a^{-1}\cdot \lambda _{12}(a^{-1}t|Z=0)\). The other rates are assumed unchanged, i.e., \(\lambda _{13}(t|Z=1)=\lambda _{13}(t|Z=0)\) and \(\lambda _{24}(t|Z=1)=\lambda _{24}(t|Z=0)\). This yields \((\beta _3,\beta _4)=(36\%,4.2 \text { months})\), indicating an average increase of 1.9 months in time with disease under treatment.
The causal parameter \(\alpha _2\) represents the average increase in time with recurrence under intensive follow-up among those who would have been diagnosed with recurrence under intensive follow-up. Assuming the monotonicity condition (\(D(0)=1 \Rightarrow D(1)=1\), i.e., any recurrence detected under standard follow-up would also be detected under intensive follow-up), the difference in time with recurrence increases to 5.5 months for this subgroup.
Simulations are conducted with equal group sizes \(n_0=n_1=50, 100, 200, 500\) and censoring rate \(\lambda _c=0,0.1\) per year. Table 1 reports the observed proportion of censored event data, \(p_C\), the average estimate \(\hat{\alpha }_2\) (Ave \(\hat{\alpha }_2\)) and bias, \(\sqrt{n}\) times the standard deviation of \(\hat{\alpha }_2\) in months across 10,000 replications (\(\text {SD}_{\alpha _2}\)). The average standard error from the Huber–White robust variance estimator (\(\hbox {Se}_{\text {HW}}\)) and the coverage of the 95% confidence interval based on the robust variance (Coverage). The bias is small for \(n_0=n_1\ge 100\) in each group, with a valid normal approximation and variance estimate, and they remain acceptable for \(n_0=n_1= 50\).
Table 1
Small sample properties of the proposed estimate
\(n_0=n_1\)
\(\lambda _{C}\)
\(p_C\)
Ave \(\hat{\alpha }_2\)
Bias
\(\sqrt{n}\text {SD}_{\alpha _2}\)
\(\sqrt{n}\text {Se}_{\text {HW}}\)
Coverage
50
0.000
0.000
5.25
− 0.24
22.00
22.32
0.940
100
0.000
0.000
5.39
− 0.10
21.64
21.59
0.946
200
0.000
0.000
5.42
− 0.07
21.10
21.27
0.951
500
0.000
0.000
5.46
− 0.03
20.86
21.12
0.953
50
0.100
0.151
5.21
− 0.28
23.01
23.27
0.945
100
0.100
0.150
5.35
− 0.14
22.36
22.31
0.946
200
0.100
0.151
5.43
− 0.06
21.88
21.87
0.949
500
0.100
0.150
5.45
− 0.03
21.62
21.73
0.953
Scenario 2. This scenario compares the power of tests based on the proposed estimator \(\hat{\alpha }_2\) with an alternative estimator \(\hat{\alpha }_2^*\), which averages the time with recurrence across all individuals, regardless of whether recurrence occurred. Using the same transition model and parameters as in Scenario 1, we vary: the recurrence rate \(\lambda _{12}=0.116, 0.232\) per year and censoring rate \(\lambda _C=0,0.1,0.2\) per year. Table 2 summarizes the recurrence probability \(p_D\), the censoring proportion \(p_C\), the average Wald statistics w (for \(\hat{\alpha }_2\)) and \(w^*\) (for \(\hat{\alpha }_2^*\)), and the empirical power to reject the null hypothesis \(\alpha _2 = 0\) or \(\alpha _2^* = 0\) at the 5% level. All results are based on 10,000 simulations with \(n_0 = n_1 = 100\). The proposed estimator generally exhibits greater power than the conventional average treatment effect.
Table 2
The power for the hypothesis \(\alpha _2=0\) based on \(\hat{\alpha }_2\) and \(\hat{\alpha }_2^*\)
\(\lambda _{12}\)
\(\lambda _{C}\)
\(p_D\)
\(p_C\)
w
Power\((\alpha _2)\)
\(w^*\)
Power \((\alpha _2^*)\)
0.116
0.000
0.377
0.000
2.75
0.696
2.24
0.603
0.116
0.100
0.345
0.150
2.66
0.675
2.16
0.579
0.116
0.200
0.316
0.276
2.58
0.656
2.09
0.551
0.232
0.000
0.675
0.000
3.41
0.872
2.97
0.834
0.232
0.100
0.619
0.130
3.32
0.850
2.89
0.813
0.232
0.200
0.570
0.239
3.24
0.838
2.80
0.796
Scenario 3. In this scenario, we investigate the efficiency of the proposed estimator. We consider cases where the event of interest occurs with probability \(p_0=0.10,0.20, \ldots ,0.50\) in the control group (group 0) and \(p_1=0.10,0.20,\ldots , 0.60\) in treatment group (group 1) with the restriction \(p_0\le p_1\) (i.e., the event occurs at least as often in the treatment group). The time to the event of interest is assumed to follow a log-normal distribution with median 2 in group 0, and medians of 1 or 1.5 in group 1, with a standard deviation of 0.5 in both groups. This corresponds to an accelerated failure time model. The study is analyzed at time \(\tau =2\), such that \(\beta _3=p_1/2\). Figure 2 displays the ratio of \(\alpha _2\) to \(\sqrt{\sigma _0^2+\sigma _1^2}\) - that is, the Z-statistic based on a single observation in each comparison group - and the estimated required sample size to reject the null hypothesis of no group difference with 80% power. The \(\beta\) parameters and \(\sigma _0, \sigma _1\) are estimated from 1,000,000 simulations. The resulting sample size requirements appear feasible for many clinical studies across a wide range of scenarios.
Fig. 2
Expected Z-statistics for a single observation in each comparison group, along with the required sample size to reject the null hypothesis of no group difference with 80% power
The vulvar cancer study (DaVulvaRec) is ongoing and is expected to conclude in 2030. Based on the model assumptions outlined in Simulation Scenario 1, we illustrate a sample size calculation and conduct the final analysis using a simulated dataset with 200 participants per group, each observed over a complete 2-year follow-up. From a clinical standpoint, the monotonicity condition is highly plausible: any recurrence identified under standard follow-up would also be detected under intensive follow-up. Accordingly, the principal stratum is defined as patients who experience recurrence during intensive follow-up.
For the sample size calculation, a simulation of a large sample with 1,000,000 replications in each group yields a causal difference of 5.5 months, with standard deviations of \(\sigma _0=15.4\) months and \(\sigma _1=14.3\) months. Assuming the study aims to demonstrate that patients who experience recurrence under intensive follow-up gain at least 1 additional month compared to standard follow-up, a total of 172 patients per group would be required to achieve 80% power using a one-sided z-test at the 2.5% significance level.
At 2 years, the estimated recurrence rate in the simulated data was 38% (95% CI 32–45%) in the intensive follow-up group and 17% (12–22%) in the standard follow-up group. The average duration of disease recurrence at 2 years was 4.6 months (95% CI 3.6\(-\)5.6) in the intensive follow-up group and 2.0 months (95% CI 1.2\(-\)2.7) in the standard follow-up group, yielding a difference of 2.6 months (95% CI 1.3\(-\)3.9). This implies that among patients who experienced recurrence under intensive follow-up, the estimated additional time spent with disease compared with standard follow-up was 6.8 months (95% CI 4.1\(-\)9.5), favoring the intensive follow-up group.
4.2 Example 2: Colon cancer
The data originate from a randomized clinical trial evaluating adjuvant chemotherapy for colon cancer. Levamisole, a compound with low toxicity originally used to treat parasitic infections in animals, was one of the treatment agents. The second agent, 5-fluorouracil (5-FU), is a moderately toxic chemotherapy drug. Patients were randomly assigned to one of three groups: observation only, levamisole alone (administered orally at 50 mg three times daily for 3 days, repeated every 2 weeks for one year), or a combination of levamisole with 5-FU (450 mg/\(\hbox {m}^2\) administered intravenously for 5 consecutive days, followed by weekly administration starting on day 28 for 48 weeks). The primary outcomes were cancer recurrence and mortality. Baseline covariates were also collected. For additional details, see Moertel et al. (1995). The dataset is available via the survival package in R.
In this analysis, we compare two groups: those assigned to Observation Only (coded as \(Z=~0\)) and those assigned to the Lev+5FU treatment (coded as \(Z=1\)). The treatment is intended to reduce the recurrence rate. Since the control group shows a higher recurrence rate, it now serves as the principal stratum. This correspond to the roles of 0 and 1 reversed as compared to the vulvar cancer example. The analysis focuses on comparing the average duration of time with cancer recurrence over a 7-year follow-up horizon among patients who would have experienced recurrence under the Observation Only condition. The time point was chosen near the largest follow-up. The monotonicity assumption states that if a patient experiences recurrence under Lev+5FU (\(D(1) = 1\)), then the patient would also experience recurrence under Observation Only (\(D(0) = 1\)). The monotonicity assumption appears largely reasonable. It provides a lower bound for the causal effect. Both the plausibility of monotonicity and the degree of deviation—captured by a sensitivity parameter—can be assessed by subject-matter experts (Shepherd et al. 2007). We further illustrate the potential impact of violations of this assumption. Although the recurrence rate may be higher in the Observation Only group than in the Lev+5FU group, the disease duration does not necessarily have to be longer, due to the competing risk of death within the principal stratum. Consequently, the causal effect may not be positive, which is one of the key motivations for investigating the principal stratum.
Fig. 3
State occupation probabilities with 95% confidence intervals
A total of 315 patients were randomized to the Observation Only group and 304 to the Lev+5FU group. The median follow-up duration was 55.7 months, with 38% of patients censored before 7 years follow-up. The Aalen-Johansen estimate of the state occupation probabilities is shown in Fig. 3. At 7 years, the recurrence rate was estimated at 57% (95% CI 51–62%) in the Observation Only group and 39% (34–45%) in the Lev+5FU group. The risk of death with recurrence was seen to be higher in the Observation Only group as compared to the Lev+5FU group, consistent with the monotonicity assumption (Fig. 3).
The average duration of disease recurrence is the expected length of stay of the Recurrence state given by the integral of the Recurrence state occupation probability curve (Fig. 3). The average duration of disease recurrence at 7 years is 10.7 months (95% CI 9.0\(-\)12.5) among Observation Only patients and 5.9 months (95% CI 4.5\(-\)7.3) among treated patients, with a difference of 4.8 months (95% CI 2.6\(-\)7.0). This suggests that among patients who experience recurrence under Observation Only, the estimated difference in time spent with disease between the Observation Only and treatment groups is 8.5 months (95% CI 4.8\(-\)12.1), favoring the treatment group. In a sensitivity analysis, assuming that the sensitivity parameter \(\gamma =0.90\), the estimated difference in time spent with disease between the Observation Only and treatment groups increases to 9.5 months (95% CI 6.1\(-\)13.0).
5 Discussion
In clinical studies where an intervention is designed to benefit a specific subgroup of patients, it is important to quantify the causal effect within that subgroup—referred to as a principal stratum. In this paper, we considered the average treatment effect for a duration outcome within the principal stratum of individuals who would experience a positive duration under one treatment. We demonstrated how this causal effect can be identified from observational data under a monotonicity assumption.
Furthermore, we showed that hypothesis testing based on the causal effect in the principal stratum offers greater statistical power than comparisons based on the overall treated and control groups. For censored duration outcomes, we illustrated how the causal effect can be estimated using multi-state models in conjunction with pseudo-observations to handle censoring.
If the conditional exchangeability assumption
holds for a set of covariates L, then—as shown in the appendix—the causal effect under monotonicity can be identified as
This identification strategy requires regression models for both \(\text {P}(D=1|Z=1,L)\) and \(\text {E}(T|Z,L)\). Alternatively, noting that \(\text{E}(T|L,Z)=\text{E}(T|L,Z,D=1)\cdot\text{P}(D=1|L,Z)\), the causal effect can instead be expressed using models for \(\text {P}(D=1 |Z=1, L)\) and \(\text {E}[T | Z,L, D=1]\). As with the suggested multi-state model approach, pseudo-observations can be employed to account for censoring, and the variance of the resulting estimator can be derived following standard techniques (Newey and McFadden 1994). In some settings, this formulation may improve efficiency compared with the estimator presented in this manuscript when the unconditional exchangeability assumption (
Full size image
) holds. However, there is a trade-off: including models for \(\text {P}(D=1|Z=1, L)\) will generally increase variance, while modeling \(\text {E}(T|Z,L)\) or \(\text {E}(T|Z,L,D=1)\) may reduce it. In Simulation Scenario 1, when including an additional disease severity covariate with "Low" or "High" levels occurring at 50% frequency, applying models for \(\text {P}(D=1|Z=1, L)\) and \(\text {E}[T|Z, L]\) resulted in a slight increase in the variance of the \(\alpha _2\) estimator (data not shown). Nevertheless, this alternative approach and its detailed implementation are beyond the scope of the present paper.
Acknowledgements
We gratefully acknowledge the constructive comments from the referee. In particular, the referee noted that Eq. (1) holds if and only if the condition \(\text {P}(D(1) = 1 | D(0) = 1) = 1\) is satisfied, and suggested that this property can be used to define a sensitivity parameter.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Note that \(\gamma \le 1\). Under monotonicity (\(D(0)=1 \Rightarrow D(1)=1\)), we have \(T(0)1(D(1)=1)=T(0)\), which directly implies \(\gamma =1\). More generally, \(\gamma\) can be treated as a sensitivity parameter to assess how violations of monotonicity affect the results.
If, instead, the conditional exchangeability assumption
Full size image
holds for a set of covariates L, then we can express
Aalen OO, Johansen S (1978) An empirical transition matrix for non-homogeneous Markov chains based on censored observations. Scand J Stat 5:141–150MathSciNet
Harrell F (2021) Assessing heterogeneity of treatment effect, estimating patient-specific efficacy, and studying variation in odds ratios, risk ratios, and risk differences. https://www.fharrell.com/post/varyor/. Accessed 19 Jun 2024
Hauck WW, Anderson S, Marcus SM (1998) Should we adjust for covariates in nonlinear regression analyses of randomized trials? Control Clin Trials 19(3):249–256. https://doi.org/10.1016/s0197-2456(97)00147-5CrossRef
Hernán MA, Robins JM (2020) Causal inference: what if. Chapman & Hall/CRC, Boca Raton
ICH E9 (R1) (2020) Ich e9(r1) addendum on estimands and sensitivity analysis in clinical trials to the guideline on statistical principles for clinical trials. EMA/CHMP/ICH/436221/2017, Step 5
Moertel CG, Fleming TR, Macdonald JS, Haller DG, Laurie JA, Tangen CM, Ungerleider JS, Emerson WA, Tormey DC, Glick JH et al (1995) Fluorouracil plus levamisole as effective adjuvant therapy after resection of stage III colon carcinoma: a final report. Ann Intern Med 122(5):321–326. https://doi.org/10.7326/0003-4819-122-5-199503010-00001CrossRef
Parner ET, Andersen PK, Overgaard M (2023) Regression models for censored time-to-event data using infinitesimal jack-knife pseudo-observations, with applications to left-truncation. Lifetime Data Anal 29(3):654–671. https://doi.org/10.1007/s10985-023-09597-5MathSciNetCrossRef
Shepherd BE, Gilbert PB, Mehrotra DV (2007) Eliciting a counterfactual sensitivity parameter. Am Stat 61(1):56–63MathSciNetCrossRef
Zach D, Åvall-Lundqvist E, Falconer H, Hellman K, Johansson H, Rådestad AF (2021) Patterns of recurrence and survival in vulvar cancer: a nationwide population-based study. Gynecol Oncol 161(3):748–754. https://doi.org/10.1016/j.ygyno.2021.03.013CrossRef