FormalPara Key Points for Decision Makers

Variation in test measurement procedures can result in systematic and/or random variation in test results (i.e. measurement uncertainty).

This uncertainty can have a significant impact on the clinical utility and cost-effectiveness of testing strategies, but is not currently routinely considered with Health Technology Assessments (HTAs).

A systematic review identified a minority of HTAs (n = 20/107; 19%) that have used various approaches to incorporate the impact of components of measurement uncertainty within a pre-model assessment (n = 19; such as a literature review or laboratory survey) and/or within the economic model (n = 5).

Uncertainty remains around best practice methodology for conducting such analyses; further research is required to ensure that future HTAs are fit for purpose.

1 Background

All measurements are subject to uncertainty, whether it be determining the distance between two objects, the level of CO2 in the atmosphere or the pressure exerted within a mechanical system. In vitro clinical tests are no exception. The time of day a sample is taken, mode of sample transportation and time between sample collection and analysis are just a few examples of a multitude of factors that can influence the concentration of substances within a test sample, thereby altering the reported test value and introducing uncertainty.

The consequence of this uncertainty is that any observed test value may be different to the ‘true’ underlying target value one wishes to measure. This can impact the clinical accuracy of a test (the ability of a test to correctly identify patients with and without a given condition) if measured values are incorrectly observed as lying above or below the test cut-off threshold used to determine disease classifications.Footnote 1 If, as a consequence, a meaningful proportion of patients receive inappropriate healthcare interventions, patients’ health may be compromised and unnecessary costs accrued. Understanding and quantifying the magnitude of test measurement uncertainty, as well as the subsequent impact on downstream test outcomes, is therefore critical in order to ensure that testing procedures are implemented only when net health benefits are expected to be obtained.

Across the developed world, the established gold-standard tool for informing evidence-based healthcare decisions is the Health Technology Assessment (HTA): a multidisciplinary process to systematically examine the safety, efficacy and cost-effectiveness of new healthcare interventions, and identify any social, organisational and ethical issues concerning adoption [1, 2]. In response to the growing importance of in vitro tests, many HTA and reimbursement authorities now include such technologies within their remit, and some institutions—such as the National Institute for Health and Care Excellence (NICE) in the UK—have established separate programmes of assessment for tests distinct to pharmaceuticals [3, 4]. These assessments typically focus on three key domains: (1) clinical accuracy—the ability of a test to correctly identify patients with and without a given condition; (2) clinical utility—the subsequent impact of a test on health outcomes; and (3) cost-effectiveness—the ability of a test to produce an efficient impact on health outcomes in relation to healthcare expenditure.

The impact of measurement uncertainty within HTA assessments is, in our experience, not routinely considered. Indeed current guidance in this area is unclear: both NICE in the UK and the Canadian Agency for Drugs and Technologies in Health (CADTH)—world leaders in technology assessments—make no mention of measurement uncertainty within their current methodology guidance, for example [4, 5]. The Medical Services Advisory Committee (MSAC) in Australia is the only authority we are aware of that specifies the need to evaluate such evidence, using the associated terminology of analytic validity [6]. However, whilst stipulating that such data should be reviewed, MSAC offer no recommendations regarding how these data should be assessed or utilised within subsequent clinical and economic assessments.

In order to establish if and how measurement uncertainty is currently being addressed within HTAs, and in particular within economic evaluations, a systematic review of reports published by internationally recognised HTA agencies [registered with the International Network of Agencies for HTA (INAHTA)] and including an economic decision model was conducted. In addition, for readers unfamiliar with the field of measurement uncertainty, a brief introduction to key concepts in the field is first provided, focusing on the case of quantitative tests (i.e. measuring the quantity or concentration of analyte within a sample, typically assessed against a given disease cut-off threshold).Footnote 2 A corresponding table of relevant terminology can be found in the Electronic Supplementary Material (Sect. 2), and further key texts are recommended for interested readers [7,8,9,10].

2 An Introduction to Measurement Uncertainty

2.1 Precision and Trueness

The central components of measurement performance are precision [characterised by the absence of random error (i.e. imprecision) in measurement] and trueness [the absence of systematic error (i.e. bias) in measurement]. Increased imprecision and/or bias in measurement results in increased measurement uncertainty.

Imprecision [expressed as a coefficient of variation (CV)Footnote 3 or standard deviation (SD)] is explored by observing the degree of dispersion in repeated test measurements [11,12,13]. The level of imprecision measured depends on how many factors expected to affect test performance (including time, operator, calibration, environment and equipment) are altered during the measurement procedure. Holding all factors constant (i.e. within-batch testing) measures repeatability; altering one or more factors within the same laboratory measures intermediate precision; whilst conducting testing across different laboratories (in which all factors would be expected to vary) measures reproducibility.

Analysis of trueness meanwhile (typically assessed according to % bias, regression analysis or difference plots) relies on comparative analysis of results from the test of interest (the index test) versus the ‘true’ target value. In reality this ‘true’ value is unknown and must be estimated using a specified reference test, ideally based on officially validated test methods or samples of known composition [but often also based on consensus data from external quality assessment (EQA)Footnote 4 schemes or established ‘gold standard’ test results].Footnote 5 Alternatively, new tests may be compared against each other (without a reliably proxy for the truth) in order to ascertain the level of between-test discordance.

An important feature in the evaluation of trueness is test selectivity: the ability of a test to identify the target analyte of interest as opposed to other sample components. Selectivity depends on the level of obstruction from substances in the test sample which either inhibit the process of binding with the target analyte (i.e. interference) or are mistaken for the target analyte, leading to ‘unintentional’ binding (i.e. cross-reactivity).

2.2 Pre-analytical, Analytical and Biological Factors

Both precision and trueness can be affected by numerous factors along the testing pathway, including (1) biological variation—fluctuations in the quantity of bodily fluids within an individual over time; (2) variation in pre-analytical factors—processes occurring prior to the point of sample analysis; and (3) variation in analytical factors—processes occurring at the point of sample analysis. These can be summarised in a ‘feather diagram’; the generalised example illustrated in Fig. 1 shows key factors grouped by category and following a (roughly) chronological order from the initial test request through to obtaining the final result.

Fig. 1
figure 1

Feather diagram depicting factors that may contribute to measurement uncertainty

2.3 Limits and Range

Various limits can be specified which determine the boundaries against which testing is reasonably conducted. These are (1) the limit of blank (LoB), defined as the highest (apparent) quantity of analyte expected to be identified when processing blank samples; (2) the limit of detection (LoD), defined as the lowest quantity reliably distinguish from the LoB; and (3) the limits of quantification (LoQ), defined as the lower and upper quantities a test can measure with a specified level of precision and trueness. Identified limits are routinely used to inform the reportable range of a test.

2.4 Summary Measures

Different elements of uncertainty, as illustrated in Fig. 1, may be combined to estimate a summary measure of uncertainty. Two main approaches to this end have been adopted in the literature: total error (TE) and uncertainty of measurement (UM). Briefly, TE is calculated as the linear sum of bias and imprecision, in which imprecision is multiplied by a ‘z factor’ to cover a required region of confidence (e.g. at the 95% confidence level, TE = bias + 1.96 * imprecision). UM on the other hand is a measure of dispersion (i.e. SD), calculated by combining individual uncertainties occurring along the testing pathway (e.g. using propagation of error rules) multiplied by a ‘coverage factor’ to similarly capture a specified region of confidence. Whilst TE represents an upper bound on the level of deviation from truth expected to occur in a given measurement, UM defines a confidence interval around the observed result that is expected to contain the true value. Although there is an ongoing debate within the literature as to the relative merits of each approach [14,15,16,17], within the context of this study, both metrics are considered to be viable measures of overall measurement uncertainty.

3 Methods

The review protocol was published in advance on the PROSPERO database (CRD42017056778). The primary source was the Centre for Reviews and Dissemination (CRD)Footnote 6 HTA database; this consists of completed and ongoing HTAs from INAHTA-registered HTA authorities (49 at the time of conducting the review)Footnote 7 in addition to 20 other CRD-recognised HTA organisations, and includes reports from national reimbursement authorities (e.g. NICE) as well as publically funded research councils [e.g. the UK National Institute for Health Research (NIHR)]. As such it is a principle resource for HTAs expected to directly influence national healthcare decisions.

A search strategy (see the Electronic Supplementary Material, section 3) combining key terms on in vitro tests and economic decision models was developed and run in March 2017. All HTAs including a model-based economic evaluation and evaluating an in vitro test (including diagnostic, screening, prognostic, predictive and monitoring tests) across any disease area, human population or setting and reported since 1999 with a full HTA report available in English were included.

Records were managed using Endnote V 7.2 (Thompson Reuters). All titles and abstracts were screened by a primary reviewer, and 10% were independently screened by a secondary reviewer. Full papers were subsequently screened by the primary reviewer only; any uncertainties regarding inclusions were checked with the secondary reviewer. For studies identified as including an assessment of measurement uncertainty, data were extracted on the specific components assessed and the methods utilised, with 10% of data extraction independently checked by the secondary reviewer. A broad definition of measurement uncertainty was adopted, including all components listed in Fig. 1, as well as data on TE, UM, limits (LoB, LoD and LoQ), reportable range and test failure rates. Results were narratively synthesised.

In addition to the HTA database, online records of key reimbursement authorities expected to be the largest contributors of relevant HTAs (NICE, CADTH and MSAC) were cross-checked by the primary reviewer [18,19,20]. Citation checking of included studies was also conducted to identify any further relevant HTAs.

4 Results

A total of 107 studies were included (see Fig. 2), and agreement between reviewers at abstract screening was good (k = 0.85).Footnote 8 A summary of study characteristics is provided in the Electronic Supplementary Material (section 4). The majority of studies were conducted within the UK (62%), followed by Canada (16%) and Australia (14%), with a gradual rise in the frequency of annual HTA publications since 1999.

Fig. 2
figure 2

PRISMA flow diagram of search results. CADTH Canadian Agency for Drugs and Technologies in Health, CRD Centre for Reviews and Dissemination, DAP Diagnostics Assessment Programme, HTA Health Technology Assessment, MSAC Medical Services Advisory Committee, NICE National Institute for Health and Care Excellence

Of the 107 identified HTAs, 71 (66%) did not evaluate measurement uncertainty. Sixteen (15%) incorporated data on test failure rates only (e.g. test failures included as an item within a literature review and/or as a parameter within the economic model) and were therefore of limited interest. Twenty studies (19%) considered further components of measurement uncertainty (see Table 1) [21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42]. The majority of these were published from 2009 onwards, and evaluated one or a small number of measurement uncertainty components (including imprecision, trueness, biological variability and pre-analytical or analytical effects) within some form of assessment prior to the economic model, such as a literature review or laboratory survey. These evaluations are henceforth denoted ‘pre-model assessments’. Five studies incorporated measurement uncertainty within the economic model itself: four in addition to a pre-model assessment [22, 30, 31, 39]; one within the model only [21]. These studies used a range of techniques—including individual patient simulation and Monte Carlo simulation—to incorporate data on test agreement [39], biological and analytical variability [21, 30, 31] or TE [22] (see Table 2).

Table 1 Summary of HTA reports (n = 20) including components of measurement uncertainty in a pre-model assessment and/or the economic decision model
Table 2 Details of HTA reports (n = 5) including components of measurement uncertainty within the economic model

5 Discussion

5.1 Review Findings

Despite limited guidance in this area, assessment of test measurement uncertainty has been attempted in a minority of HTAs (n = 20; 19%) indicating that such analyses are feasible.

The majority of studies (n = 19) included measurement uncertainty within some form of pre-model assessment, such as a literature review or laboratory survey. Indeed the frequency of these assessments appears to have been increasing in recent years; this may reflect the fact that more HTAs of tests are being conducted in general, a growing awareness of the importance of measurement uncertainty, and/or increasing availability of relevant data upon which to base such evaluations. On the whole, however, these studies were considered to be partial assessments: most considered one or a limited set of measurement uncertainty components and none formally assessed (i.e. beyond a general discussion) the potential quantitative impact of measurement uncertainty on clinical accuracy or utility.

A small minority of studies (n = 5) utilised data on test measurement uncertainty within the economic model. Of those, the most recent (Stein et al. 2016) was not a direct attempt to account for measurement uncertainty, but rather the authors here utilised between-test discordance data as a means of evaluating additional tests in the model [39]. Meanwhile the oldest study (Marks et al. 2000) is most interesting as an example of what not to do [21]. Here the authors simply set the proportion of false positive results equal to a given level of biological and analytical variability (i.e. imprecision), which fails to account for the dependence of test misclassifications on the position of values relative to the test cut-off threshold. In contrast, the approach taken by MSAC correctly accounted for this dependency, by first assigning ‘true’ test values, simulating the addition of measurement uncertainty to generate observed values (in this case, using TE to define a confidence interval around the true value),Footnote 9 and then comparing these results against the given cut-off threshold to determine the proportion of misdiagnoses [22]. Similarly the more recent studies by Farmer et al. (2014) and Perera et al. (2015) simulated the addition of uncertainty on top of ‘true’ baseline values; in this case also accounting for the impact of uncertainty in the rate of baseline health and disease progression within repeated testing scenarios using regression analysis of longitudinal individual patient data [30, 31]. A key drawback with this approach, however, concerns the data and computational resources required, which would likely pose challenges within typical HTA timelines.

Only the MSAC study explicitly explored the impact of variation in measurement uncertainty on cost-effectiveness [22]. Here the authors found that, whilst variation in TE was not expected to alter the overall decision uncertainty (since all results remained above the specified 100,000 Australian dollars (AUS$) cost-effectiveness threshold), it was expected to have a significant impact on the base case results (resulting in a 24% drop from AUS$133,934 to AUS$101,419 per life year gained when reducing TE from 8% to 0%). This example clearly illustrates the potential impact of measurement uncertainty on cost-effectiveness, which could feasibly be of significant importance in scenarios exhibiting baseline results closer to the cost-effectiveness threshold.

5.2 Future Research

Whilst this review has identified previous HTA assessments of measurement uncertainty, outstanding uncertainties and issues require consideration before general guidance in this area can be feasibly implemented. For pre-model assessments, future studies would benefit from (currently lacking) guidance on best practice methods to conduct, synthesise and report literature reviews of measurement uncertainty, as well as appropriate methodology for utilising data from alternative resources (e.g. laboratory surveys, EQA reports and pathology studies). For economic evaluations, future case studies could explore particular considerations of interest including the following: the relative importance of various components of measurement uncertainty for different kinds of tests (e.g. diagnostic vs monitoring; laboratory vs point of care test; quantitative vs qualitative etc.); the use of alternative summary measures versus individual components of measurement uncertainty; and the feasibility of different approaches. In addition, outside the scope of HTAs, we are aware of several studies that have utilised Monte Carlo simulation methods to explore the impact of measurement uncertainty on clinical accuracy as a means of identifying test analytical performance goals (i.e. maximum allowable imprecision and/or bias in order to maintain clinical accuracy) [43,44,45,46]; extending HTA evaluations to include similar assessments (which could feasibly be based on cost-effectiveness outputs in addition to clinical accuracy) is another potential avenue for exploration in future studies, which could further extend the clinical impact of HTAs.

5.3 Strengths and Limitations

This review focused on reports from INAHTA-registered and CRD-recognised HTA authorities, which are expected to reflect best practice methodologies and directly influence healthcare reimbursement and adoption decisions. Taking a broader perspective and considering all kinds of evidence which may inform healthcare decision making (e.g. stand-alone cost-effective assessments) would likely yield additional findings of interest; as may expanding the search to before 1999 (although the majority of relevant studies identified were from 2009 onwards) and non-English languages. Nevertheless, this is the first systematic review of its kind, which highlights both advances and issues in current approaches to HTAs and can help to inform the direction of future research and guidance in this area. Furthermore, whilst the focus of this study was on in vitro tests, many of the issues here highlighted will be of relevance to pharmacological studies utilising tests as surrogate outcome measures, as well as evaluations of imaging and in vivo technologies.

6 Conclusions

Various approaches have been adopted within a minority of HTAs to assess test measurement uncertainty. Further research is required to identify best practice methodology for conducting such analyses and to ensure that future HTAs are fit for purpose.

Data Availability Statement

The dataset generated during the current study is available in the Research Data Leeds Repository (https://doi.org/10.5518/324).