Skip to main content
Top

2016 | Book

Statistical Applications from Clinical Trials and Personalized Medicine to Finance and Business Analytics

Selected Papers from the 2015 ICSA/Graybill Applied Statistics Symposium, Colorado State University, Fort Collins

insite
SEARCH

About this book

The papers in this volume represent a broad, applied swath of advanced contributions to the 2015 ICSA/Graybill Applied Statistics Symposium of the International Chinese Statistical Association, held at Colorado State University in Fort Collins. The contributions cover topics that range from statistical applications in business and finance to applications in clinical trials and biomarker analysis. Each papers was peer-reviewed by at least two referees and also by an editor. The conference was attended by over 400 participants from academia, industry, and government agencies around the world, including from North America, Asia, and Europe.

Table of Contents

Frontmatter

Biomarker and Personalized Medicine

Frontmatter
Optimal Biomarker-Guided Design for Targeted Therapy with Imperfectly Measured Biomarkers
Abstract
Targeted therapy revolutionizes the way physicians treat cancer and other diseases, enabling them to adaptively select individualized treatment according to the patient’s biomarker profile. The implementation of targeted therapy requires that the biomarkers are accurately measured, which may not always be feasible in practice. In this article, we propose two optimal biomarker-guided trial designs in which the biomarkers are subject to measurement errors. The first design focuses on a patient’s individual benefit and minimizes the treatment assignment error so that each patient has the highest probability of being assigned to the treatment that matches his/her true biomarker status. The second design focuses on the group benefit, which maximizes the overall response rate for all the patients enrolled in the trial. We develop a likelihood ratio test to evaluate the subgroup treatment effects at the end of the trial. Simulation studies show that the proposed optimal designs achieve our design goal and obtain desirable operating characteristics.
Yong Zang, Ying Yuan
Statistical Considerations for Evaluating Prognostic Biomarkers: Choosing Optimal Threshold
Abstract
The use of biomarker is increasingly popular in cancer research and various imaging biomarkers have been developed recently as prognostic markers. In practice, a threshold or cutpoint is required for dichotomizing continuous markers to distinguish patients with certain conditions or responses from those who are without. Two popular ROC based methods to establish “optimal” threshold are based on Youdan index J and closest top-left criterion. We have shown in this paper the importance to acknowledge the inherent variance of such estimates. In addition, a purely data-driven approach to search for optimal threshold can produce estimates that are not necessarily meaningful due to the large variance in such estimates. Instead, we propose to estimate the threshold through pre-specified criterion, such as a fixed level of specificity. The confidence intervals of the threshold and sensitivity at the pre-specified specificity are much narrower compared to the quantities measured through either Youdan index J or closest top left criterion. We suggest to estimate the threshold at a pre-specified level of specificity, and the sensitivity at that threshold, all the estimates should be accompanied by appropriate 95 % confidence intervals.
Zheng Zhang
Accuracy of Meta-Analysis Using Different Levels of Diagnostic Accuracy Measures
Abstract
Diagnostic studies report results in sensitivity and specificity, or figures of receiver operation characteristics (ROC) curve. Meta-analysis synthesizes these diagnostic accuracy measures from different studies to obtain an overall summary ROC curve. Increasingly, meta analysis also uses individual patient level data. However, the pro and con of such an approach are not entirely clear. In this paper, we performed a simulation study to evaluate the accuracy of summary ROC curves derived from different types of data, i.e., the paired sensitivity and specificity from individual study, the study-specific ROC curves, and the individual patient level data. Extensive simulation experiments were conducted under various settings to compare the empirical performance of estimated summary ROC curves using data from three levels. The simulation results demonstrated that the method based on reported ROC curves from individual study provides accurate and robust summary ROC curve compared with alternatives including those based on patient level data and is preferred in practice.
Yanyan Song, Ying Lu, Lu Tian

Bayesian Methods and Applications

Frontmatter
Bayesian Frailty Models for Multi-State Survival Data
Abstract
Multi-state models can be viewed as generalizations of both the standard and competing risks models for survival data. Models for multi-state data have been the theme of many recent published works. Motivated by bone marrow transplant data, we develop a Bayesian model using the gap times between two successive events in a path of events experienced by a subject. Path specific frailties are introduced to capture the dependence among the gap times sharing the same path with two or more states. In this study, we focus on a single terminal event. Under improper prior distributions for the parameters, we establish propriety of the posterior distribution. An efficient Gibbs sampling algorithm is developed for sampling from the posterior distribution. A bone marrow transplant data set is analyzed in details to demonstrate the proposed methodology.
Mário de Castro, Ming-Hui Chen, Yuanye Zhang
Bayesian Integration of In Vitro Biomarker to Analysis of In Vivo Safety Assessment
Abstract
Prolongation of the QT interval of electrocardiogram (ECG) is a critical safety concern in drug development. To enhance prediction of QT prolongation risks of a drug in human, it has been proposed to better integrate in vitro and in vivo models for preclinical QT prolongation assessment (Hanson et al., J Pharmacol Toxicol Methods 54:116–129, 2006). By evaluation of the Health and Environmental Sciences Institute of the International Life Sciences Institute (ILSI/HESI) data set, Chiang and Wang (Stat Biopharm Res 7:66–75, 2015) proposed a Bayesian approach to incorporation of in-vivo information in in-vitro animal QT analysis. The approach has been shown to increase predictive power, improve decision making, and reduce unnecessary exposure in animal studies. In this chapter, we extend on the previous work by Chiang and Wang (Stat Biopharm Res 7:66–75, 2015) to further investigate how in vitro data can be integrated by the proposed approach and how decisions concerning QT prolongation can be more informatively made for drugs moving from pre-clinical to clinical evaluation.
Ming-Dauh Wang, Alan Y. Chiang
A Phase II Trial Design with Bayesian Adaptive Covariate-Adjusted Randomization
Abstract
Adaptive randomization (e.g. response-adaptive (RA) randomization) has become popular in phase II clinical research because of its flexibility and efficiency, which also have the patient centric advantage of assigning fewer patients to inferior treatment arms. However, these designs lack a mechanism to actively control the imbalance of prognostic factors, i.e. covariates that substantially affect the study outcome. Improving the balance of patient characteristics among the treatment arms could potentially increases the statistical power of the trial. We propose a phase II clinical trial design that is response-adaptive and that also actively balances the covariates across treatment arms. We then incorporate this method into a sequential RA randomization design such that the resulting design skews the allocation probability to the better treatment arm, and also controls the imbalance of the prognostic factors across the arms. The proposed method extends the existing randomization procedures which either requires polytomizing continuous covariates or uses fixed allocation probability to adjust covariates imbalance. Simulation studies are also conducted to examine the operating characteristics of the design with existing approaches to illustrate the recommendation for clinical practice.
Jianchang Lin, Li-An Lin, Serap Sankoh

Dose Ranging Studies in Clinical Trials

Frontmatter
Sample Size Allocation in a Dose-Ranging Trial Combined with PoC
Abstract
In recent years, pharmaceutical industry has experienced many challenges in discovering and developing new drugs, including long clinical development timelines with significant investment risks. In response, many sponsors are working to speed up the clinical development process. One strategy is to combine the Proof of Concept (PoC) and the dose-ranging clinical studies into a single trial at the early Phase II development. One important question in designing this trial is how to calculate the sample size for such a study. In most of the early Phase II development programs, the budget concerns and ethical concerns may limit the total sample size for the trial. This manuscript discusses various ways of allocating the sample size to each treatment group, under a given total sample size; as well as the performance of different contrast test for PoC.
Qiqi Deng, Naitee Ting
Personalized Effective Dose Selection in Dose Ranging Studies
Abstract
We consider the problem of predicting the personalized minimum effective dose and estimating the dose-dependent optimal subgroups in dose-ranging studies. Our research is motivated by a real randomized, double-blind, placebo-controlled phase II dose-ranging study with genetic markers. One goal of the analysis is to identify subgroups with enhanced benefit/risk profiles with approriate doses and inform the study design of future phase III trials. To the best of our knowledge, this problem has not been systematically studied before. We proposed a novel framework to nonparametrically model the dose-dependent biomarker-outcome relationship and to estimate the personalized effective dose and dose-dependent optimal subgroups. Our proposed method will be useful for identifying the respondent subgroups and their accompanying doses for the future study design. We illustrate the proposed method with simulation studies. Our method compares favorably to two ad-hoc approaches.
Xiwen Ma, Wei Zheng, Yuefeng Lu

Innovative Clinical Trial Designs and Analysis

Frontmatter
Evaluation of Consistency Requirements in Multi-Regional Clinical Trials with Different Endpoints
Abstract
In recent years, there is an increasing trend to conduct multi-regional clinical trials (MRCT) for drug development in Pharmaceuticals industry. A carefully designed MRCT could be used in supporting the new drug’s approval in different regions simultaneously. The primary objective of an MRCT is to investigate the drug’s overall efficacy across regions while also assessing the drug’s performance in some specific regions. In order to claim the study drug’s efficacy and get drug approval in some specific region(s), the local regulatory authority may require the sponsors to provide evidence of consistency in the treatment effect between the overall patient population and the local region. Usually, the regional specific consistency requirement needs to be pre-specified before the study conduct and the consistency in treatment effect between the region(s) of interest and overall population will be evaluated at the final analysis. In this paper, we evaluate the consistency requirements in multi-regional clinical trials for different endpoints, i.e., continuous, binary and survival endpoints. We also compare the different consistency requirements of the same endpoint/measurement if multiple consistency requirements are enforced and our recommendations for each endpoint/measurement will be made based on the comprehensive consideration.
Zhaoyang Teng, Jianchang Lin, Bin Zhang
A Statistical Decision Framework Applicable to Multipopulation Tailoring Trials
Abstract
Interest in tailored therapeutics has led to innovations in trial design and analysis. Trials now range from traditional overall population trials with exploratory subgroup analysis to single population tailoring trials, to multipopulation tailoring trials. This paper presents an overview of the trial options and provides a framework for decision making in confirmatory multipopulation tailoring trials.
Brian A. Millen
Assessing Benefit and Consistency of Treatment Effect Under a Discrete Random Effects Model in Multiregional Clinical Trials
Abstract
The traditionally uniform treatment effect assumption may be inappropriate in an multiregional clinical trial (MRCT) because of the impact on the drug effect due to regional differences. Lan and and Pinheiro (2012) proposed a discrete random effects model (DREM) to account the treatment effects heterogeneity among regions. However, the benefit of the overall drug effect and the consistency of the treatment effect in each region are two major issues in MRCTs. In this article, the power of benefit is derived under DREM and the overall sample size determination in an MRCT. Comparison of DREM and traditional continuous random effects model (CREM) is also illustrated here. In order to assess the treatment benefit and consistency simultaneously under DREM, we consider the concept of the Method 2 in “Basic Principles on Global Clinical Trials” guidance to construct the probability function of benefit and consistency. We also optimize the sample size allocation to reach maximum power for the benefit and consistency.
Jung-Tzu Liu, Chi-Tian Chen, K. K. Gordon Lan, Chyng-Shyan Tzeng, Chin-Fu Hsiao, Hsiao-Hui Tsou
Design and Analysis of Multiregional Clinical Trials in Evaluation of Medical Devices: A Two-Component Bayesian Approach for Targeted Decision Making
Abstract
Current statistical design and analysis of multiregional clinical trials for medical devices generally follows a paradigm where the treatment effect of interest is assumed consistent among US and OUS regions. In this paper, we discuss the situations where the treatment effect might vary among US and OUS regions, and propose a two-component Bayesian approach for targeted decision making. In this approach, anticipated treatment difference among US and OUS regions is formally taken into account by design, hopefully leading to increased transparency and predictability of targeted decision making.
Yunling Xu, Nelson Lu, Ying Yang
Semiparametric Analysis of Interval-Censored Survival Data with Median Regression Model
Abstract
Analysis of interval censored survival data has become increasingly popular and important in many areas including clinical trials and biomedical research. Generally, right censored survival data can be seen as a special case of interval censored data. However, due to the fundamentally special and complex nature of interval censoring, most of the commonly used survival analysis methods for right censored data, including methods based on martingale-theory (Andersen et al., Statistical models based on counting processes. Springer, New York, 1992), can not be used for analyzing interval censored survival data. Most of the popular semiparametric models for interval censored survival data focus on modeling the hazard function. In this chapter, we develop a semiparametric model dealing with the median regression function for interval censored survival data, which introduce many practical advantages in real applications. Both semiparametric maximum likelihood estimator (MLE) and the Markov chain Monte Carlo (MCMC) based semiparametric Bayesian estimator, including how to incorporate the historical information, have been proposed and presented. We illustrate the case study through a real breast cancer data example and make a comparison between different models. Key findings and recommendations are also discussed to provide further guidance on application in clinical trials.
Jianchang Lin, Debajyoti Sinha, Stuart Lipsitz, Adriano Polpo
Explained Variation for Correlated Survival Data Under the Proportional Hazards Mixed-Effects Model
Abstract
Measures of explained variation are useful in scientific research, as they quantify the amount of variation in an outcome variable of interest that is explained by one or more other variables. We develop such measures for correlated survival data, under the proportional hazards mixed-effects model (PHMM). Since different approaches have been studied in the literature outside the classical linear regression model, we investigate four sample-based measures that estimate three different population coefficients. We show that although the three population measures are not the same, they reflect similar amounts of variation explained by the predictors. Among the four sample-based measures, we show that the first one (R 2) which is the simplest to compute, is also consistent for the first population measure (\(\Omega ^{2}\)) under the usual asymptotic scenario when the number of clusters tends to infinity; the other three sample-based measures, on the other hand, all require that in addition the cluster sizes be large. We study the properties of the measures through simulation studies. We illustrate their usage on a multi-center clinical trial data set.
Gordon Honerkamp-Smith, Ronghui Xu
Some Misconceptions on the Use of Composite Endpoints
Abstract
Composite endpoint has been used frequently as a primary endpoint in clinical trials. However, there are some misconceptions on the use of composite endpoints. This paper identifies these misconceptions and discusses how to avoid them.
Jianjun (David) Li, Jin Xu

Clinical and Safety Monitoring in Clinical Trials

Frontmatter
A Statistical Model for Risk-Based Monitoring of Clinical Trials
Abstract
Risk-based monitoring allows monitors of clinical trial sites to focus their visits on sites with the greatest potential for risk reduction. Here we present a statistical model that recommends sites for the monitor to visit. The model makes use of a pre-visit assessment supplied by the monitor, as well as other measurable factors, to predict the monitor’s post-visit assessment of the risk reduction resulting from the visit. The monitor is then directed to visit the sites with the highest predicted risk reduction. We demonstrate the properties of this model using a simulation. Our simulation compares two strategies for directing monitors, one of which relies on the model, while the other strategy relies only on the monitor’s pre-visit assessments. Our simulation demonstrates that the model-based strategy can direct the monitors to sites with greater potential for risk reduction. Finally, we discuss alternative models as well as potential pitfalls of risk-based monitoring.
Gregory J. Hather
Blinded Safety Signal Monitoring for the FDA IND Reporting Final Rule
Abstract
We introduce a safety monitoring procedure for two-arm blinded clinical trials. This procedure incorporates a Bayesian hierarchical model for using prior information and pooled event rates to make inferences on the rate of adverse events of special interest in the test treatment arm. We describe a collaborative process for specifying the prior and calibrating the operating characteristics.
Greg Ball, Patrick M. Schnell

Statistical Applications in Nonclinical and Preclinical Drug Development

Frontmatter
Design and Statistical Analysis of Multidrug Combinations in Preclinical Studies and Phase I Clinical Trials
Abstract
Multidrug combination is an important therapeutic approach for cancer, viral or microbial infections, hypertension and other diseases involving complex biological networks. Synergistic drug combinations, which are more effective than predicted from summing effects of individual drugs, often achieve increased therapeutic index. Because drug-effect is dose-dependent, multiple doses of an individual drug need to be examined, yielding rapidly increasing number of combinations and a challenging high dimensional statistical modeling problem. The lack of proper design and analysis methods for multi-drug combination studies have resulted in many missed therapeutic opportunities. Although systems biology holds the promise to unveil complex interactions within biological systems, the knowledge on network remains predominantly topological until very recently. This article summarizes recent work on efficient maximal power experimental designs on multi-drug combinations, and statistical modeling of the resulting data. The design and analysis of vorinostat and cytarabine combination study is presented to illustrate the approach. We then introduce a model based adaptive Bayesian phase I trial design for drug combinations utilizing the modeling concept. To tackle the challenging problem of combinations of more than three drugs, we present a novel two-stage procedure starting with an initial selection by utilizing an in silico model built upon experimental data of single drugs and current systems biology information to obtain maximum likelihood estimate.
Ming T. Tan, Hong-Bin Fang, Hengzhen Huang, Yang Yang
Statistical Methods for Analytical Comparability
Abstract
In all manufacturing settings, there is an inherent drive to improve product through the reduction in process variation, implementing new technology, increasing efficiency, optimizing resources, and improving customer experience through innovation. In the pharmaceutical industry, these improvements come with added responsibility to the patient such that product made under the post-improvement or post-change condition maintains the safety and efficacy of the pre-change product. Regulatory agencies recognize the importance in providing manufacturers the flexibility to improve their manufacturing processes (FDA, Guidance Concerning Demonstration of Comparability of Human Biological Products, 1996; ICH Q5E, ICH Guidance for Industry: Q5E Comparability of Biotechnology/Biological Product Subject to Changes in Their Manufacturing Process, 2005). They also acknowledge that some changes may not require additional clinical studies to demonstrate safety and efficacy so that implementation may be more efficient and expeditious to benefit patients. When clinical studies are not necessary, a minimum requirement remains to demonstrate that the post-change product is comparable to the pre-change product. This comparison is known as analytical comparability. Analytical comparability may be demonstrated through the use of statistical and non-statistical methods. The choice of the methodology is not defined by the guidance documents. This paper presents an overview and use of equivalence tests and statistical intervals as options to demonstrate analytical comparability.
Leslie Sidor
Statistical Applications for Biosimilar Product Development
Abstract
Regulatory approval of biosimilar products requires demonstration of analytical similarity of functional and structural attributes between the proposed biosimilar product and on-market reference product. The statistical framework for how to evaluate the analytical similarity data has recently been published and a U.S. regulatory guidance is expected soon. This paper illustrates the challenges and issues encountered by Hospira (a Pfizer company) in implementing this newly described statistical framework to support the analytical similarity assessments for biosimilar products. A simulation approach using multilevel (hierarchical) linear regression is also proposed to statistically derive shelf-life specification limits. The approach may be applicable when there is larger volume of data that can be generated as part of the analytical similarity assessment. The performance of the simulation approach is compared when there is a limited vs. sufficiently large sample size and when the quality attribute of interest has a low vs. high analytical variability. The proposed simulation approach to calculate shelf-life specification limits is also benchmarked against a commonly utilized approach in industry based on a fixed effect Analysis of Covariance (ANCOVA) model.
Richard Montes, Bryan Bernat, Catherine Srebalus-Barnes

Statistical Learning Methods and Applications with Large-scale Data

Frontmatter
A Statistical Method for Change-Set Analysis
Abstract
In many scientific studies, it is of interest to group spatial units on a lattice with similar characteristics within a group but with distinction among groups. Here we develop a novel change-set method for this purpose, as a substantive extension of the existing change-point analysis for one-dimensional data in space or time. Our method addresses unique challenges resulting from the multi-dimensional space such as changes that occur abruptly in space and change sets of arbitrary shapes. In particular, we propose an entropy measure and establish quasi-likelihood estimation that accounts for covariates via change-set regression and spatial correlation via working covariance. For illustration, our method is applied to analyze a county-based socio-economic data set.
Pei-Sheng Lin, Jun Zhu, Shu-Fu Kuo, Katherine Curtis
An Alarm System for Flu Outbreaks Using Google Flu Trend Data
Abstract
Outbreaks of influenza pose a serious threat to communities and hospital resources. It is important for health care providers not only to know the seasonal trend of influenza, but also to be alarmed when unusual outbreaks occur as soon as possible for more efficient, proactive resource allocation. Google Flu Trends data showed a good match in trend patterns, albeit not in exact occurrences, with the proportion of physician visits attributed to influenza from the Centers for Disease Control, and, hence, provide a timely, inexpensive data source to develop an alarm system for outbreaks of influenza. For the State of Connecticut, using weekly Google Flu Trends data from 2003 to 2012, an exponentially weighted moving average control chart was developed after removing the seasonal trend from the observed data. The control chart was tested with the 2013–2015 data from the Center for Disease Control, and was able to issue an alarm at the unusually earlier outbreak in the 2012–2013 season.
Gregory Vaughan, Robert Aseltine, Sy Han Chiou, Jun Yan
Identifying Gene-Environment Interactions with a Least Relative Error Approach
Abstract
For complex diseases, the interactions between genetic and environmental risk factors can have important implications beyond the main effects. Many of the existing interaction analyses conduct marginal analysis and cannot accommodate the joint effects of multiple main effects and interactions. In this study, we conduct joint analysis which can simultaneously accommodate a large number of effects. Significantly different from the existing studies, we adopt loss functions based on relative errors, which offer a useful alternative to the “classic” methods such as the least squares and least absolute deviation. Further to accommodate censoring in the response variable, we adopt a weighted approach. Penalization is used for identification and regularized estimation. Computationally, we develop an effective algorithm which combines the majorize-minimization and coordinate descent. Simulation shows that the proposed approach has satisfactory performance. We also analyze lung cancer prognosis data with gene expression measurements.
Yangguang Zang, Yinjun Zhao, Qingzhao Zhang, Hao Chai, Sanguo Zhang, Shuangge Ma
Partially Supervised Sparse Factor Regression For Multi-Class Classification
Abstract
The classical linear discriminant analysis (LDA) may perform poorly in multi-class classification with high-dimensional data. We propose a partially supervised sparse factor regression (PSFAR) approach, to jointly explore the potential low-dimensional structures in the high-dimensional class mean vectors and the common covariance matrix required in LDA. The problem is formulated as a multivariate regression analysis, with predictors constructed from the class labels and responses from the high-dimensional features. The regression coefficient matrix is then composed of the class means, for which we explore a sparse and low rank structure; we further explore a parsimonious factor analysis representation in the covariance matrix. As such, our model assumes that the high-dimensional features are best separated in their means in a low-dimensional subspace, subject to a few unobserved latent factors. We propose a regularized log-likelihood criterion for model estimation, for which an efficient Expectation-Maximization algorithm is developed. The efficacy of PSFAR is demonstrated by both simulation studies and a real application using handwritten digit data.
Chongliang Luo, Dipak Dey, Kun Chen

Statistical Applications in Business and Finance

Frontmatter
A Bivariate Random-Effects Copula Model for Length of Stay and Cost
Abstract
Copula models and random effect models are becoming increasingly popular for modeling dependencies or correlations between random variables. Recent applications appear in such fields as economics, finance, insurance, and survival analysis. We give a brief overview of the principles of construction of copula models from the Farlie-Gumbel-Morgenstern, Gaussian, and Archimedean families to the Frank, Clayton, and Gumbel families. We develop a flexible joint model for correlated errors modeled by copulas and incorporate a cluster level random effect to account for within-cluster correlations. In an empirical application our proposed approach attempts to capture the various dependence structures of hospital length of stay and cost (symmetric or asymmetric) in the copula function. It takes advantage of the relative ease in specifying the marginal distributions and introduction of within-cluster correlation based on the cluster level random effects.
Xiaoqin Tang, Zhehui Luo, Joseph C. Gardiner
Backmatter
Metadata
Title
Statistical Applications from Clinical Trials and Personalized Medicine to Finance and Business Analytics
Editors
Jianchang Lin
Bushi Wang
Xiaowen Hu
Kun Chen
Ray Liu
Copyright Year
2016
Electronic ISBN
978-3-319-42568-9
Print ISBN
978-3-319-42567-2
DOI
https://doi.org/10.1007/978-3-319-42568-9

Premium Partner