Introduction
Gastric cancer is the second most common cause of cancer mortality worldwide [
1]. Patients with advanced disease represent over two-thirds of newly diagnosed cases [
2]. Despite advances in diagnosis and treatment, the prognosis for patients with advanced disease remains poor with the median survival reported to range from 5.3 to 10.2 months [
3]. Several randomised trials demonstrated survival and quality of life (QL) benefits of chemotherapy compared with best supportive care [
4]. Fluorouracil (5-FU) remains one of the most widely used drugs with the introduction of several other agents including cisplatin and anthracyclines being investigated in doublet and triplet combinations together with 5-FU or capcitabene [
5‐
7]. The survival advantage of any of these combination regimens, compared with each other, is small and as such no internationally accepted standard of care regimen has emerged [
6,
8]. The primary objectives of treatment in this palliative setting are to relieve symptoms, improve QL and prolong survival [
9,
10]. However, a recent literature review and meta-analysis concluded that the impact of chemotherapy-related toxicity on the patients’ quality of life has been insufficiently studied in patients with advanced gastric cancer [
7].
Webb et al. compared the combination of epirubicin, cisplatin and 5-FU (ECF) with 5-FU, doxorubicin and methotrexate (FAMTX) in previously untreated patients with advanced esophagogastric cancer [
11]. Using the EORTC QLQ-C30, the authors showed that ECF resulted in a better QL at 24 weeks compared with FAMTX. Subsequently, Ross et al. (2002) showed that ECF resulted in a better QL at 3 and 6 months when compared with mytomycin C, cisplatin and 5-FU (MCF) [
12]. ECF has never been directly compared with CF, although a meta-analysis suggests ECF has a survival advantage over CF. However, concerns remain over the toxicity of ECF and the role of epirubicin in the combination [
13]. More recently, Van Cutsem et al. (2006) compared the combination treatments docetaxel plus cisplatin and fluorouracil (DCF) vs. cisplatin and fluorouracil (CF) as first-line therapy for advanced gastric cancer [
14]. The study met its primary endpoint showing a significant improvement in time-to-progression (TTP) with DCF compared with CF, and also an improvement in survival, and response rate were reported. Although, higher incidences of toxicity were observed in the DCF treatment arm, this did not appear to impact on QL which was significantly better in the DCF arm. These results suggest that better tumour control may also have lead to better symptom control in the DCF arm [
14,
15].
Following promising results using irinotecan in combination with either 5-FU or cisplatin in phase II trials [
16‐
19], a phase II–III trial was initiated in previously untreated advanced gastric adenocarcinoma patients comparing irinotecan plus cisplatin with irinotecan plus 5-FU given as an infusional AIO regimen (IF) [
20]. Based on the risk/benefit ratio for IF in this study, a phase III trial was designed to compare IF to cisplatin plus 5-FU administered using a 5-day infusional regimen (CF). QL results for the phase III part of the study are reported here. The clinical results and initial QL results have been presented elsewhere [
21].
As with other studies [
14], the initial analysis of the QL data for the current study was carried out using time-to-event analysis (e.g. time to a 5% deterioration of the global health status/QL scale) in accordance with the statistical analysis plan. It is generally considered that for QL data, time-to-event analysis is limited since it does not take into account the repeated measures aspect of the data or the potential bias introduced by missing data. The analysis presented in this report addresses the fact that QL is a process and consequently is subject to change over time, that measurements taken at different time points are correlated, and that patients drop out during the study or have intermittent missing data, thus taking the entire structure of the QL data into account.
Patients and methods
Patient eligibility
Patients were to have histologically confirmed adenocarcinoma (including diffuse type, intestinal type and linitis) of the stomach or esophagogastric junction, with measurable or evaluable metastatic disease (cytology or histology was mandatory if a single metastatic lesion was the only manifestation of disease) or locally recurrent disease with at least one measurable lymph node. Patients were also required to be between 18 and 75 years of age, have Karnofsky performance status (KPS) >70%, life expectancy >3 months and adequate haematological parameters. The study was conducted in accordance with the Declaration of Helsinki and was approved by national or local ethics committees, as appropriate. All patients provided written informed consent. Further details regarding patient eligibility are provided elsewhere [
21].
Study treatments
Subjects randomised to the IF arm were scheduled to receive irinotecan 80 mg/m2 over a 30-min i.v. infusion, followed by FA 500 mg/m2 as a 2-h i.v. infusion, immediately followed by 5-FU 2,000 mg/m2 over a 22-h i.v. infusion, day 1 every week for 6 weeks followed by a 1-week rest. In the CF arm patients were scheduled to receive cisplatin 100 mg/m2 as a 1–3-h i.v. infusion, day 1, followed by 5-FU 1,000 mg/m2/day over a 24-h i.v. infusion, days 1–5, and every 4 weeks. Treatment was administrated until disease progression, unacceptable toxicity or withdrawal of consent.
Study design
The primary objective of the phase III part of the study was to detect a statistically significant increase in TTP for the IF arm relative to the CF arm. In addition, a non-inferiority comparison was specified in the protocol, in case of a non-significant trend towards superiority of TTP for the IF arm [
22,
23]. Tumour progression was assessed according to World Health Organization Criteria and TTP measured from randomisation until the date of progression or death. Randomisation was performed using the minimisation technique [
24], stratifying patients according to presence of measurable vs. evaluable disease, liver involvement, baseline weight loss, prior surgery and centre.
QL assessments
The EORTC QLQ-C30 (version 3.0) and EuroQoL (EQ-5D) instruments were used in this study. The QLQ-C30 is a cancer specific, self administered assessment of QL. The scale scores were calculated as per the scoring procedure defined in the EORTC QLQ-C30 Scoring Manual [
25]. The EQ-5D is also a self-administered instrument comprising five questions and a visual analogue scale, which represents a rating of the patient’s health state [
26]. The five single items are combined to obtain a health utility index (HUI) score. This report focuses on the global health status\QL, physical functioning, social functioning, pain and nausea/vomiting scales of the EORTC QLQ-C30 and the two EQ-5D scales.
Quality of life assessments were required at baseline, every 8 weeks until documentation of disease progression and every 3 months from the documentation of the progression until death. To be considered evaluable at baseline, a questionnaire must have been filled in within 15 days before randomisation. To be considered evaluable on treatment, a questionnaire had to be filled in more than 5 days (IF arm) or 9 days (CF arm) after the start of the latest infusion so as not to take into account the immediate toxicities following infusion. The different lag durations after the start of the infusion allowed for the different infusion durations to be taken into account (1 day in the IF arm, 5 days in the CF arm).
Questionnaires without a date of assessment, or filled in after the cut-off date or after a further anti-tumour therapy were excluded (i.e. considered non-evaluable). Since assessments were planned to be evaluated independently from cycle duration, data were to be analysed according to time windows (8 week periods, i.e. ±4 weeks of the theoretical assessment date for assessments before investigator documented progressive disease). In case of multiple evaluable questionnaires in a time window, the mean value per subject for each scale in the time window was calculated.
Questionnaires excluded from the analysis were either considered present but not evaluable (i.e. see above description) or missing. The reason for missing questionnaires was collected on the CRF pages. The reasons were categorised as follows: random (i.e. administrative and similar reasons not directly related to patient QL), QL related (e.g. the patient was too ill to complete the questionnaire) or dead.
Statistical methods
Quality of life compliance was calculated as the ratio of the total number of subjects with at least one evaluable questionnaire per time window over the total number of expected questionnaires [
27]. Patients were counted in the total number of expected questionnaires in the window only until further anti-tumour therapy or death prevented assessment.
Summary measures of QL scores were generated: i.e. the minimum, maximum and mean post-baseline QL scores within each patient, for each scale over all evaluable questionnaires, were calculated and summarised by treatment group. The Wilcoxon non-parametric test was used to compare treatment groups as the summary measures, particularly for the minimum and maximum, were not expected to be normally distributed.
A logistic regression model was fitted to test if the QL data missing from patients who dropped out was missing completely at random (MCAR) [
28]. The model included terms for time (as a linear variable expressing the 8-weekly assessment time points), treatment (as a binary variable), time by treatment interaction and two terms representing global health status\QL scores: (1) sum of the two previous scores and (2) the difference between the two previous scores. The
P-value for the Wald chi-squared statistic was used to test the effect of QL scores on dropout.
A pattern-mixture model was fitted in SAS using Proc Mixed [
29,
30]. This model allows one to model the repeated measures structure of the data taking into account the dropout pattern. Terms in the model included treatment, time, dropout pattern and their interactions. Thus, a priori, the fixed effects as well as the covariance parameters were allowed to vary unconstrained according to the dropout pattern. In addition, several baseline clinical variables (age, gender, WHO performance status, pain assessed by the clinician, prior surgery and weight loss) were considered as covariates in the model. Model reduction was carried out using a likelihood ratio test to identify the most parsimonious model consistent with the data. If the treatment effect in the final model was pattern dependent, the delta method would be used to obtain the marginal treatment effect [
31]. As such, the treatment effect is estimated within each pattern and the overall marginal treatment effect is estimated using a weighted summation of the individual within pattern treatment effects, weighted according to the proportion of subjects in each dropout pattern. The null hypothesis of no treatment effect would be tested using a Wald statistic, which approximates a chi-squared distribution with one degree of freedom.
Discussion
In this study, preliminary analysis using summary measures was carried out in an exploratory fashion. There were a number of significant results in the comparison of the two treatment groups consistently indicating a better QL in the IF treatment group. The main differences between treatment groups were observed for the physical functioning and nausea\vomiting scales from the EORTC QLQ-C30 and the two EQ-5D scales. Non-significant trends towards a difference were observed for the social functioning, pain and global health status\QL scale in favour of IF.
More questionnaires were completed in the IF arm than in the CF arm. As such the probability of observing an extreme result (e.g. minimum or maximum) is increased in the IF arm since the more frequently a process is observed the more often one will observe an extreme result. The number of questionnaires and the patterns of completion of questionnaires also varied considerably between patients. Missing data were prevalent. It was shown that missing data at earlier time points were due mainly to random reasons, e.g. administrative failure whereas missing questionnaires at later time points were missing mainly due to death. As such it may be argued that intermittent missing questionnaires were primarily due to random reasons (i.e. MCAR), whereas monotone missingness were due to progression of disease or death. This latter point is supported by testing the dropout process. Testing the dropout process indicated that questionnaires at the time of dropout were not MCAR. The results indicated that if QL scores were low then the probability of dropout was high. This phenomenon was confirmed when plotting the mean global health status\QL scores over time by dropout pattern. The imbalance in compliance and the dropout of patients suggests that simplified analyses such as time-to-event analysis and analysis using summary measures may be biased. Consequently, more complex analyses were carried out using pattern-mixture models to reduce any potential bias.
The final pattern-mixture model indicated that mean QL scores were dependent on dropout pattern and that the variance–covariance structure had an autoregressive component. Analysing the data neglecting to take this information into account could be considered wasteful and potentially biased. Using the pattern-mixture model, significant treatment differences were observed for the physical functioning scale, nausea\vomiting and EQ-5D thermometer in favour of the IF treatment arm. These results were mainly consistent with those using the mean as a summary measure. However, for most scales the treatment effect was less significant using the pattern-mixture model. This is partially explained by the fact that between patient variation is artificially reduced using summary measures thus resulting in larger effect sizes. The findings of the QL analysis are also consistent with the toxicity profile as recorded through adverse event reporting [
21]. While the rates of diarrhoea, cholinergic syndrome and fever without infection were higher in patients receiving IF, these symptoms were manageable in the current study. The higher rates of neurological toxicities, anorexia, stomatitis, alopecia, febrile neutropenia/neutropenic infection, thrombocytopenia and creatinine elevation in the CF arm, in addition to nausea and vomiting, are consistent with a negative impact on the physical functioning and nausea\vomiting scales. This was also reflected in the significantly higher number of withdrawals due to treatment-related AEs in the CF arm. In addition, the previously reported advantages in terms of efficacy (TTP and time to treatment failure) and clinical benefit (KPS, appetite and weight loss) all favoured patients receiving IF [
21].
Analysis of the QL data using pattern-mixture analysis yielded more significant results than using time-to-event analysis. The original analysis of this study and other studies have used time-to-event analyses [
14,
21]. However analyses of QL data using time-to-event are limited and potentially biased for a number of reasons. For example, when analysing the global health status\QL scale, 58% of patients had censored time-to-events. In analysis of QL data of advanced gastric patients where the time-to-event is time-to-deterioration of QL one could argue that there is informative censoring, i.e. missing questionnaires after dropout are not MCAR and consequently the probability of being censored is not completely at random. This is particularly important in this study when analysing the global health status\QL scale as the majority of patients had censored time-to-events. Conversely, only 42% of patients had observed the event of interest (i.e. deterioration of QL score). As the number of QL events is small the power to detect a treatment difference is small. Consequently, even if large differences are expected between treatment groups, the probability of observing a significant difference is small. Thus time-to-event analyses would appear to be potentially biased and wasteful for analysing QL data.
Currently, there is no internationally agreed upon gold standard for conducting and reporting QL studies in cancer clinical trials [
32]. While other authors have also used the EORTC QLQ-C30 questionnaire in advanced gastric cancer, sometimes reporting of results was poor and was limited to a few paragraphs within the overall clinical paper [
11,
12,
14]. For example, details concerning compliance within treatment arms were not provided and methods of analysis were sub-optimal as they did not take into account the structure of the data, i.e. repeated (correlated) measurements with missing data. It is imperative that sufficient details concerning QL assessment, analysis and reporting are provided to allow comparisons of findings across studies. This is particularly relevant in diseases such as advanced gastric cancer where survival rates are similar across treatment regimens.
The use of irinotecan-based regimens for the treatment of advanced gastric cancer has been further explored in phase II studies during the last few years, especially with the availability of new targeted agents [
33‐
35]. Although initial results are promising, suggesting that IF could represent a potential platinum-free alternative backbone to be combined with new targeted agents, results from phase III studies are required before drawing any firm conclusions. QoL assessment should be incorporated as a prominent objective in phase III studies in advanced gastric cancer to help both patients and physicians to discuss treatment choices and aid decision making [
6,
7].
In summary, there was a trend in favour of IF over CF in time-to-progression. The IF treatment arm also demonstrated a better safety profile than the CF arm and a better QL on a number of multi-item scales. These results would suggest that IF offers an alternative platinum-free first-line treatment option for advanced gastric cancer which should be explored further in combination with new targeted agents.
Acknowledgments
We thank the following investigators for their participation in the study: Dr. Tzekova (Bulgaria), Dr. Wang, Sun (China), Dr. Valvere (Estonia), Drs. Kellokumpu and Pyrhonen (Finland), Drs. Khayat, Ychou, Bugat, Ducreux, Rixe, Malaurie, Lam Kam Sang (France), Drs. Koehne, Clemens, Mross, Peschel, Aul (Germany), Dr. Georgoulias (Greece), Drs. Wenczl, Dank, Baki (Hungary), Drs. Rath, Catane, Figer, Schnirer, Klein, Isaacson (Israel), Drs. Bajetta, Barone, Iacono, Recchia, Cascinu, Schinzari, Pozzo (Italy), Dr. Ghosn (Lebanon), Drs. Zaluski, Nowacki, Popiela (Poland), Dr. Gorbounova (Russia), Drs. Rapoport, Landers, Jacobs, Slabber (South Africa), Drs. Perez-Manga, Germa Lluch, Massuti, Carrato (Spain), Drs. Glimelius, Starkhammar (Sweden), Dr. Su (Taiwan), Drs. Nortier, Groenewegen, Bos, Jansen, Creemers (The Netherlands), Drs. Icli, Aykan, Yilmaz, Goker, Yalcin (Turkey). Pfizer supported the quality of life analysis performed by Omega Research.