Introduction

Researchers should aim for collecting high quality and complete data, as missing data may lead to loss of information in epidemiological and clinical research [1]. However, missing data are unavoidable when performing trials where data is collected through self-report by the participants. Cost data are prone to missing data because economic evaluations are often “piggy-backed” onto clinical studies where cost data are rarely the primary outcome. Moreover, one missing cost measurement results in a missing total cost estimate, because costs are summed over all measurements.

Three types of missing data are commonly distinguished; missing completely at random (MCAR), missing at random (MAR) and missing not at random (MNAR). MCAR refers to data that is missing by chance and is unrelated to the study participants. An example of MCAR is a questionnaire that is accidentally lost in the mail. Data MCAR do not bias the results of the study, but do decrease the power of the study. Missing at random (MAR) occurs when there is data that is missing from the data set, but there are variables in the data set that can explain why the data is missing. As we know the reason for the missing data, we can create models to fill in this missing data. Missing not at random (MNAR) is where there is data that is missing and there are no variables to explain why the data is missing. An example of this could be that participants who work full-time do not return questionnaires because they are too busy. However, we do not have information available on the number of hours worked by participants. If this characteristic is also related to the outcome of interest, the results of the study will be biased. Imputation of data is difficult because no information is available that predicts missingness of data.

Complete case analysis (CCA) is the default strategy to deal with missing data although it is known for biased estimates, wide standard errors and decreased power. Oostenbrink et al. [2] and Briggs et al. [3] showed that multiple imputation techniques performed better than CCA and simple imputation techniques [conditional mean imputation, single imputation with predictive mean matching (PMM), hot decking and expectation maximization] [2, 3].

Recently, multiple imputation has been recommended as the most appropriate way for handling missing data [1, 47]. Multiple imputation can be a powerful tool for estimating missing data [5], but there are some important points to consider when specifying the multiple imputation model. First, the imputation model should include all variables that explain missing values. Second, it should include all variables included in the analysis model, and third the imputation model must account for the distribution of the data. This assumption may not be met when imputing cost data in trials because of the distributional issues posed by cost data, including constrained positive values, a large amount of zero values, and right-handed tail skewness.

Multiple imputation with predictive mean matching (PMM) can be a helpful tool for dealing with the skewed distribution of cost data, because PMM preserves the distribution of the data and, therefore, is robust against violations of the normality assumption [5]. Another commonly recommended approach for dealing with skewed data is to take the log of the skewed variables before imputation and then back transform the variables to their original scale before the target analysis [4, 5, 8]. Lee and Carlin [8] compared multiple imputation with transformation and PMM to deal with non-normality in continuous variables. They recommended transformation of skewed variables to a symmetric distribution to avoid the introduction of biases of study results. Another alternative is to impute missing data in two separate steps. In the first step, the probability of having costs is imputed which takes care of the zero inflation, and in the second step, an actual cost value is imputed for individuals that are predicted to have costs. In the second step, the skewness of the cost data is taken into account by using the PMM algorithm to impute the cost values for the people that are predicted to have costs using only the observed cost data [9].

It is unclear which method to deal with imputation of skewed data is the most appropriate in economic evaluations. Therefore, the objective of this article was to investigate which imputation strategy is most appropriate to impute missing cost and effect data in an economic evaluation alongside a pragmatic randomized controlled study. The strategies compared include complete case analysis (CCA), MI with predictive mean matching (MI-PMM), MI with predictive mean matching on log-transformed costs (log MI-PMM), and two-step multiple imputation with predictive mean matching (two-step-MI).

Methods

Reference data set

The reference data set was obtained from two open-labelled randomized controlled trials evaluating the cost-utility of medical co-prescription of heroin compared with methadone maintenance treatment alone among 430 chronic, treatment resistant heroin addicts with a follow-up period of 1 year. Psychosocial treatment was offered throughout the trials. Full details on this study are presented elsewhere [10]. Outcomes included QALYs based on the EuroQol (EQ-5D) and costs from a societal perspective [11]. The EQ-5D includes five dimensions: mobility, self-care, usual activities, pain/discomfort and anxiety/depression [11]. The respondent answers each of the EQ-5D’s five dimensions with one of three possible responses: ‘no problems’, ‘some problems’ or ‘severe problems’. Each participant completed the EQ-5D at baseline and at months 6, 10, and 12 during treatment. The health states from the EQ-5D were subsequently converted to utilities using the York tariff [12]. We calculated QALYs by multiplying the utility of each health state by the time in between two measurements and summing the results over the 12-month treatment period. Cost estimates were measured through clinical report forms and the European version of the addiction severity index (EuropASI) [13] to collect data on the use of healthcare resources, travel related to the programme and illegal activities. The EuropASI was completed at the same intervals at the EQ-5D. The valuation of the cost categories was according to Dutch guidelines [14]. Occasional missing values were imputed using last observation carried forward, resulting in a complete data set for all 430 participants which was considered the complete reference data set.

Missing data

Author 2 generated missing data in the complete data set using R statistical software [15, 16]. With this program [16], multivariate incomplete data can be generated according to the MAR mechanism, which means that the creation of missing data is independent of the imputation models that were evaluated. We used a linear combination of the observed data to get the probability of having missing data for each person in the data set.

Three incomplete data sets were created with 17, 35 and 50 % missing data to investigate the effect of the rate of incomplete data on the performance of the imputation methods. We chose these percentages to reflect low, medium and high amounts of missing data that might influence the results of the analysis [17]. It has been shown that missing data under 10 % will not affect the results of the analysis considerably [17]. Even with 50 % missing data, multiple imputation can result in valid inferences on the data [5].

Missing data points were created in the QALY variable and several cost variables. The probability of missing data was related to other variables in the data to satisfy a missing at random (MAR) assumption for the missing data. Centre, location, age, administering of a second interview, and abstinence were predictors of missingness in the utility and cost outcome variables for the data set with 17 % missing data. In the data set with 35 % missing data, predictor variables were treatment group, centre, sex, age, and occurrence of a second interview. In the data set with 50 % missing data, the predictor variables were treatment group centre, age and occurrence of a second interview. Table 1 presents all key cost variables with missing data for the different missing data scenarios.

Missing data strategies

CCA

In CCA, analysis was restricted to participants with complete cost and effect data. This resulted in smaller sample sizes than in the reference data set (see Table 1).

Multiple imputation procedure

Multiple imputation was done using fully conditional specification. Fully conditional specification or chained equations is a flexible multivariate model that does not rely on the assumption of multivariate normality [5]. Regression models are specified for each variable with missing values, conditional on all of the other variables in the imputation model. Imputations are generated by drawing from iterated conditional models [5].

The imputed values were estimated using the predictive mean matching (PMM) algorithm. PMM is an algorithm that matches the missing value to the observed value with the closest predicted estimate [4]. The predicted mean is estimated in a regression equation where a random residual term is added to the estimate in order to account for missing data uncertainty. In PMM, instead of using the predicted estimate, the imputed value is randomly selected from observed values that are closest to the predicted estimate. For example, an older single man misses a measurement for blood pressure and the value for this man is estimated to be 102.34 mmHg by regressing blood pressure on age and sex. Five other older single men have observed blood pressures of 103; 103; 102; 101, and 104 mmHg, respectively. The missing value is then imputed with a random draw from these five blood pressures. PMM has several advantages when imputing cost data. It is more robust against non-normal data as it uses the observed distribution of the data. Furthermore, it imputes only plausible values because it randomly draws from observed values. The process of estimating imputed values is repeated in sequential cycles, each time using the updated data with the imputed estimates from the previous cycle. These cycles are called iterations. One of these iterations (e.g. the 100th) was selected and used as an imputed data set until ‘m’ data sets were selected in total. We used 200 imputations to minimize internal variation so that the imputation variation would not affect the performance of each imputation method [1, 1820]. We performed MI using the chained command in Stata 12, which uses fully conditional specification to perform the multiple imputations [21].

We performed the multiple imputations stratified by treatment group to maintain the possible group effect in the data. For all multiple imputation strategies we checked the convergence plots to see if iterations were free from trend, and imputations were successful. To solve any occurring convergence problems, we merged highly correlated variables together. For this reason, travel costs were merged together with total programme costs (correlation coefficient >0.9). In-patient hospital consultations and in-patient length of hospital stay were also highly correlated and were therefore merged together as well.

Three multiple imputation strategies were compared and are described below.

  1. 1.

    MI-PMM: in the first multiple imputation strategy we performed multiple imputation with predictive mean matching on the raw data.

  2. 2.

    Log MI-PMM: in the second multiple imputation strategy, we applied the predictive mean matching algorithm to the log transformed cost data. This was done by first adding a constant to the raw cost data in order to circumvent problems when transforming zero values, and next the log was taken. After imputation, the complete data were transformed back to their original scale prior to any analyses being performed.

  3. 3.

    Two-step MI: the third multiple imputation strategy was a conditional two-step approach. We recoded cost variables to dummy variables where subjects were coded as 1 if they had costs and a 0 for no costs. Missing values were left to be multiply imputed with either a 0 or 1 using a logit function. Next, multiple imputation with the PMM algorithm was performed for missing cases with a value 1 on the dummy variables. Only cases with cost estimates higher than zero were used for this imputation step. For variables that did not have a sufficient amount of zeroes to perform the conditional imputation, we chose to apply only the second step on the raw cost variable.

Statistical analysis

We used a generalized linear regression model with a gamma distribution and an identity link to estimate mean differences in total costs. The gamma distribution was chosen to take into account the right skewness of the cost data. The generalized linear model for quality adjusted life years (QALYs) was adjusted for baseline utility estimates. Mean differences and standard errors were pooled using Rubin’s rules [20].

We estimated the correlation between the incremental total costs and the incremental QALYs in the reference data set and the imputed data sets. In the multiple imputation strategies, the covariance between total costs and QALYs was calculated based on the Fisher z transformation and was then pooled using Rubin’s rules [5, 22].

Incremental cost-effectiveness ratios (ICERs) were calculated using the pooled cost and effect estimates. The ICER is calculated as \(\frac{{\hat{\Delta }_{c} }}{{\hat{\Delta }_{e} }}\), where \(\hat{\Delta }_{c}\) is the difference in total costs between the two intervention groups and \(\hat{\Delta }_{e}\) is the difference in QALYs between the two intervention groups.

Incremental net benefit (INB) estimates were calculated using the following formula: \({\hat{b}}\left( \lambda \right) = \hat{\Delta }_{e} \lambda - \hat{\Delta }_{c}\) [23, 24], where \(\hat{\Delta }_{e}\) is the difference in QALYs between the two intervention groups, λ is the willingness to pay, and \(\hat{\Delta }_{c}\) is the difference in costs. The variance of INB was calculated using: \(V\left[ {\hat{b}\left( \lambda \right)} \right] = \hat{V}(\hat{\Delta }_{e} )\lambda^{2} + \hat{V}\left( {\hat{\Delta }_{c} } \right) - 2\hat{C}\left( {\hat{\Delta }_{e} ,\hat{\Delta }_{c} } \right)\lambda\), where \(\hat{C}\) is the covariance between the differences in total costs and QALYs [23, 24]. We set the willingness-to-pay at €30,000 because this is roughly equivalent to the cut-off value mentioned in the Standard National Institute of Clinical Excellence guidelines (₤20,000–₤30,000 per QALY) for economic evaluations [25].

Cost-effectiveness acceptability curves (CEAC) were estimated to quantify the uncertainty due to sampling and measurement errors and because lambda is generally unknown. The CEAC is a plot of the probability that co-prescribed heroin compared to methadone maintenance only is cost-effective (y-axis) as a function of the money society might be willing to pay for one additional QALY (x-axis). The pooled coefficients and variance parameters from the regression models were used for the CEACs.

Comparison of strategies

The estimates from the reference data set were considered the “true values” and we compared the estimates from the different multiple imputation strategies with these true values. The primary outcomes of interest were the value of INB at a willingness to pay of €30,000 per QALY, the standard error of INB and the probability that co-prescribed heroin compared to methadone maintenance at a willingness to pay of €30,000 per QALY. We evaluated the percentage of bias from the reference analysis (RA) in the different imputation strategies for cost and effect differences, standard error estimates, p values and t values. The strategies that gave the closest estimates to the reference data set were considered the best.

Sensitivity analysis

Research suggests that it is better to impute at the item and not the total level [26, 27]. Therefore, we imputed the total cost variable directly as a sensitivity analysis for all missing data strategies.

Results

Costs

Table 2 contains baseline characteristics and the variables used to calculate the utilities. Total costs consisted of programme costs, law enforcement costs, costs of damage to victims, health related travel costs and other health care costs. Table 1 presents the frequency distributions of each cost category in the reference data set and the other multiple imputation strategies. Table 3 presents the cost estimates for the reference case, the CCA, and the different imputation strategies for 17, 35 and 50 % missing data. The difference in costs of −€12,792 in the RA fell within the 95 % confidence intervals of all multiple imputation strategies for all rates of missing data. The CCA deviated the most from the RA compared to all other strategies, specifically with regard to the cost differences and the associated standard errors in all scenarios. For 17 % missing data, the CCA showed a statistically significant difference in costs just as in the reference analysis. However, for 35 or 50 % missing data the cost difference was no longer statistically significant. The multiple imputation strategies gave similar results to each other in the 17 and 35 % missing data sets showing smaller differences in costs and larger standard errors when the amount of missing data increased compared to the reference analysis. The log transformed-PMM deviated the least from the RA in the 50 % missing data set for the cost difference, standard error and p values. The two-step MI deviated the most from the RA with regard to cost differences and the standard errors in the data set with 50 % missing data.

Table 1 Baseline characteristics of the reference data set
Table 2 Descriptive statistics of the cost variables (euros) for the reference data set and the data sets with missing values
Table 3 Overview of cost estimates for the missing data methods

QALYs

Table 4 provides the QALY results for the 17, 35 and 50 % missing data. In the 17 % missing data set, all strategy deviations were roughly the same amount for the difference in QALYs and standard error. All imputation strategies, including the CCA, showed a statistically significant difference (p < 0.001) in QALYs between the two intervention groups.

Table 4 Overview clinical effect estimates of QALY model for the missing data methods

In the data set with 35 % missing data, the QALY coefficient in the CCA deviated the least and the most deviation occurred in the log MI-PMM, but the reference coefficient was still contained in all confidence intervals. The standard error of the CCA deviated the most from the standard error in the RA while the MI-PMM deviated the least. All strategies still showed that co-prescribed heroin was associated with higher QALY scores compared to methadone maintenance.

In the 50 % missing data set, the QALY coefficient deviated the most in the MI-PMM and the least in the CCA but the regression coefficient from the RA was still within all 95 % confidence intervals. The standard error for the CCA deviated the most from the reference analysis, but the deviation in all MI strategies was similar. The CCA was the only strategy where the difference in QALYs was no longer statistically significant.

Cost-utility analysis

Figure 1 and Table 5 show the ICERs, INB, its variance, and the probability that co-prescribed heroin compared to methadone maintenance is cost-effective at a threshold value of €30,000/QALY for the 17, 35 and 50 % missing data sets. The CCA showed the largest deviation from the RA for the INB and its standard error, and the ICER in the 17 % missing data scenario. The INBs in the two-step MI strategy deviated the least from the INB in the reference analysis. The standard error deviated similarly for all imputation strategies. The reference value of INB was contained in the confidence intervals of all imputation strategies. The probability of co-prescribed heroin compared to methadone maintenance being cost-effective was 99 % for a willingness-to-pay threshold value of €30,000 for a one-unit gain in QALY score regardless of the imputation strategy.

Fig. 1
figure 1

Incremental net benefit (in euros) coefficients for a threshold value of €30,000 based on the amount of missing data and imputation method. MI-PMM multiple imputation with predictive mean matching, log-MI-PMM multiple imputation with predictive mean matching on log-transformed costs, MI-PMM 2 step two-step multiple imputation with predictive mean matching

Table 5 Cost effectiveness analysis estimates for the missing data methods

In the 35 % missing data scenario, the CCA deviated the most from the RA for the ICER, INB and its standard error, and the probability that the intervention was cost effective. The MI-PMM deviated least from the RA for the INB standard error compared to the other imputation strategies. The probability of co-prescribed heroin being cost-effective compared with methadone maintenance was 97 % for a willingness-to-pay threshold value of €30,000 for a one-unit gain in QALY score for all multiple imputation strategies versus 99 % for the RA (CCA was 90 %).

In the scenario with 50 % missing, the INB was no longer statistically significant for the CCA. The log MI-PMM showed the least deviation from the RA in the INB coefficient and its standard error, and the probability that the intervention was cost effective. The probability of co-prescribed methadone being cost-effective compared with methadone maintenance at €30,000/QALY was 97 %.

The reference INB was within the 95 % confidence intervals for all imputation strategies (see Fig. 1). For the CCA, INB was no longer statistically significant with 35 and 50 % missing data. INB decreased with higher rates of missing data and the uncertainty was larger, as evidenced by the larger standard errors and wider confidence intervals in all strategies. The log MI-PMM showed the least uncertainty around INB in all missing data scenarios. Figure 2 presents the CEAC curves for the different strategies with 50 % missing data. This figure shows that there are pronounced differences between the strategies in this scenario. It shows that the probability that co-prescribed heroin is cost-effective when the threshold value is zero is 98 % for the reference analysis, 94 % for the log MI-PMM and MI-PMM, 92 % for the two-step-MI and 78 % for the CCA. This increases to 99, 97, 97,96 and 82 % for the RA, MI-PMM, log MI-PMM, two-step MI and CCA, respectively, at a threshold value of €30,000/QALY.

Fig. 2
figure 2

Cost-effectiveness acceptability curves for threshold value of €30,000 based on the 50 % missing data scenario. PMM Multiple imputation with predictive mean matching, Log multiple imputation with predictive mean matching on log-transformed costs, 2step two-step multiple imputation with predictive mean matching

Sensitivity analysis

The imputation procedure was applied to the total costs directly for the MI-PMM and the log MI-PMM. The results showed that the precision decreased, resulting in wider standard errors and increased percentage of bias in the cost difference from the reference analysis when applying multiple imputation to the total costs compared with imputation of sub-cost variables (data not shown).

Discussion

Main findings

In this study, we evaluated the performance of different multiple imputation strategies and CCA for scenarios with varying rates of missing data in costs and effects in a pragmatic economic evaluation. We found that for all rates of missing data, multiple imputation strategies performed better than CCA. The results of the CCA, MI-PMM and the two-step MI were all influenced by the amount of missing data. With a larger amount of missing data, the log MI-PMM deviated the least from the RA for the cost difference, cost standard error, INB and its standard error, and the probability that the co-prescribed heroin treatment was cost effective in comparison with methadone maintenance at a willingness to pay of €30,000 per QALY. Therefore, the log MI-PMM is considered most appropriate for imputing missing cost and effect data. However, when considering QALYs the MI-PMM performed best since it deviated the least from the RA with increasing amounts of missing data. Overall, the log MI-PMM was least affected by the amount of missing data.

Our results imply that addressing only the right-skewness of the data by using a log transformation in combination with PMM is enough and that strategies to deal with zero inflation such as our two-step PMM are not needed. The results are also consistent with the advice in the literature that recommends implementing a log transformation when imputing skewed data [4, 5, 8].

Beforehand, we expected that the two-step MI strategy would have performed better because it controls for the large amount of zeroes and the skewness in the data. However, in practice there were no relevant differences with the other multiple imputation strategies and the two-step MI was more difficult to apply than the log MI-PMM. Not all software packages have incorporated a comprehensive way to apply the two-step MI strategy, whereas the log MI-PMM is easily applied and available in software packages like SPSS, Stata, SAS and R.

Comparison with existing literature

Our study adds to the findings from other studies that multiple imputation is better than CCA for dealing with missing data in economic evaluations [2, 3, 8, 28, 29]. However, in contrast to Briggs et al. [3], Oostenbrink et al. [2] and Burton et al. [28], we had information on the observed values of the missing data, because we created the missing data ourselves using the MAR assumption. This allowed us to estimate the deviation of the different imputation strategies from the original complete data set.

Yu et al. [29] showed in a simulation study that predictive mean matching in R and STATA performed reasonably well and maintained the underlying distribution of the resource use data [29]. However, they did not evaluate the effect of the different imputation strategies on the cost-effectiveness estimates.

Faria et al. [30] created a structured approach and practical guidance on how to handle missing data on costs and health outcomes while comparing inverse probability weighting, multiple imputation and likelihood-based methods. They concluded that multiple imputation was flexible to use and allowed for more flexible sensitivity analyses. They did not look at the different types of multiple imputation strategies that we have in economic evaluations.

Strengths and limitations

Our study adds to previous studies by focussing on estimation of both incomplete costs, QALYs and cost-effectiveness and by comparing different MI strategies using the MICE (PMM) in STATA. Additionally, we use a correlation after multiple imputation between costs and utilities using Fisher’s Z transformation to calculate the cost-effectiveness [5, 22]. We used the fully conditional specification with PMM which gave us more flexibility around assumptions of normality [31].

Other strengths of this study were its systematic and applied approach using real data to examine the performance of different multiple imputation strategies in situations with varying amounts of missing data. To our knowledge, this is one of the first studies to compare the two-step MI strategy with other multiple imputation strategies for cost-effectiveness evaluations.

As we used only one data set we were limited in our evaluation parameters for direct comparisons to the true coefficients instead of averages over simulations. We did perform a small simulation pilot study repeating the imputation procedures to verify the stability of the methods. This was done by repeatedly drawing samples of 100 cases from each of our incomplete data sets and applying our method to these small samples. We simulated 1000 times and used 15 imputations and 20 iterations. For each method and incomplete data condition the average over the 1000 simulations was taken and compared to the complete reference data results. This simulation confirmed the relative differences between the performances of the methods presented in this study. This might reduce generalizability to other scenarios and contexts. Future research should perform a larger simulation study and vary the proportion of zeroes to see how that affects the performance of the missing data methods. It is possible that with a greater amount of zeroes the two-part model becomes more beneficial over the other methods. We assumed the same missing mechanism in both treatment arms, and in future simulations this probably should be changed using simulated data.

Implications for further research

Prospective economic evaluations alongside trials play an important role in providing decision makers with cost-effectiveness information to inform reimbursement decisions. It is important that economic evaluations provide robust and unbiased information. The consequences of using different imputation strategies can affect policy decisions. In this study, we considered co-prescribed heroin treatment to be cost-effective in comparison with methadone maintenance in all strategies evaluated, although the uncertainty increased. The decision may change depending on the imputation procedure chosen in situations with smaller differences between groups.

In conclusion, we recommend the use of the log MI-PMM because of its ease of use and its reliable results with increasing amounts of missing data. Log MI-PMM also appears to perform well for zero-inflated data, providing a constant is used in place of the zero in the data.