Skip to main content

Über dieses Buch

This volume presents 27 selected papers in topics that range from statistical applications in business and finance to applications in clinical trials and biomarker analysis. All papers feature original, peer-reviewed content. The editors intentionally selected papers that cover many topics so that the volume will serve the whole statistical community and a variety of research interests. The papers represent select contributions to the 21st ICSA Applied Statistics Symposium. The International Chinese Statistical Association (ICSA) Symposium took place between the 23rd and 26th of June, 2012 in Boston, Massachusetts. It was co-sponsored by the International Society for Biopharmaceutical Statistics (ISBS) and American Statistical Association (ASA). This is the inaugural proceedings volume to share research from the ICSA Applied Statistics Symposium.



Applications in Business and Finance


Constructing and Evaluating an Autoregressive House Price Index

We examine house price indices, focusing on an S&P/Case-Shiller-based index and an autoregressive method. Issues including the effect of gap time on sales and the use of hedonic information are addressed. Furthermore, predictive ability is incorporated as a quantitative metric into the analysis using data from home sales in Columbus, Ohio. When comparing the two indices, the autoregressive method is found to have the best predictive capabilities while accounting for changes in the market due to both single and repeat sales homes in a statistical model.
Chaitra H. Nagaraja, Lawrence D. Brown

On Portfolio Allocation: A Comparison of Using Low-Frequency and High-Frequency Financial Data

Portfolio allocation is one of the most fundamental problems in finance. The process of determining the optimal mix of assets to hold in the portfolio is a very important issue in risk management. It involves dividing an investment portfolio among different assets based on the volatilities of the asset returns. In the recent decades, it gains popularity to estimate volatilities of asset returns based on high-frequency data in financial economics. However there is always a debate on when and how do we gain from using high-frequency data in portfolio optimization. This paper starts with a review on portfolio allocation and high-frequency financial time series. Then we introduce a new methodology to carry out efficient asset allocations using regularization on estimated integrated volatility via intra-day high-frequency data. We illustrate the methodology by comparing the results of both low-frequency and high-frequency price data on stocks traded in New York Stock Exchange over a period of 209 days in 2010. The numerical results show that portfolios constructed using high-frequency approach generally perform well by pooling together the strengths of regularization and estimation from a risk management perspective.
Jian Zou, Hui Huang

Applications of Functional Dynamic Factor Models

Accurate forecasting of zero coupon bond yields for a continuum of maturities is paramount to bond portfolio management and derivative security pricing. Yet a universal model for yield curve forecasting has been elusive, and prior attempts often resulted in a tradeoff between goodness-of-fit and consistency with economic theory. To address this, herein we propose a novel formulation which connects the dynamic factor model (DFM) framework with concepts from functional data analysis: a DFM with functional factor loading curves. This results in a model capable of forecasting functional time series. Further, in the yield curve context we show that the model retains economic interpretation. We show that our model performs very well on forecasting actual yield data compared with existing approaches, especially in regard to profit-based assessment for an innovative trading exercise. We further illustrate the viability of our model to applications outside of yield forecasting.
Spencer Hays, Haipeng Shen, Jianhua Z. Huang

Mixed Modeling for Physician-Direct Campaigns

Today, pharmaceutical companies leverage multiple channels, such as sales rep visits, samples, professional journals, and even consumer mass media, to reach out to physicians in order to increase product awareness and drug knowledge to gain incremental market share or prescribing volume. Measuring the influence of each individual channel is critical for future planning and resource optimization. The analytic challenge occurs when physicians are exposed to multiple channels simultaneously and the impact of each channel may have different life spans. Traditional ANCOVA (analysis of variance with covariates) is no longer sufficient to see the whole picture. In this paper, we will present a mixed modeling approach to longitudinal data to answer two important business questions: (1) How effective are different channels in promoting sales? (2) How should we allocate resources across multiple channels?
Wei Huang, Lynda S. Gordon

Uplift Modeling Application and Methodology in Database Marketing

While there is a broad consensus that incrementality is the accurate measurement of database marketing impact, few marketing activities today are focused on uplift effect; because most of the target campaigns are selected by leveraging propensity models which maximize the gross response or demand. In this paper, we will introduce a tree-based uplift modeling methodology, which optimizes true marketing profitability. We will also discuss the major stages involved in this approach, with a real-life example from analytic services in the specialty retail industry.
Junjun Yue

Biomarker Analysis and Personalized Medicine


Designing Studies for Assessing Efficacy in Mixture Populations

In personalized medicine, the patient population is thought of as a mixture of two or more subgroups that might derive differential efficacy from a drug. A decision to make is which subgroup or union of the subgroups should the drug be developed for. Interestingly, some common measures of efficacy are such that its value for a mixture population may not be representable as a function of efficacy for the subgroups and their prevalence. This chapter describes design of study that would lead to probabilistic models so that relative risk (or odds ratio) for a mixture population can be represented as a function of relative risk (or odds ratio) for the subgroups and their prevalence.
Szu-Yu Tang, Eloise Kaizar, Jason C. Hsu

Estimating Subject-Specific Treatment Differences for Risk-Benefit Assessment with Applications to Beta-Blocker Effectiveness Trials

In the recent past, several clinical trials have sought to evaluate the effectiveness of beta-blocking drugs in patients with chronic heart failure. Although the studies of certain drugs in this class yielded overwhelmingly positive results, other studies resulted in a much less clear interpretation. As result, attention has appropriately been placed on the impact of patient heterogeneity on treatment assessment. For clinical practice, it is desirable to identify subjects who would benefit from the new treatment from a risk-benefit perspective. In this paper, we investigate the results of the noted Beta-Blocker Evaluation of Survival Trial (BEST) and implement a systematic approach to achieve this goal by analyzing data available early in the study, at the time of a hypothetical initial interim analysis. We utilize multinomial outcome data from these initial patients to build a parametric score for the purpose of stratifying the remaining patients in the BEST study. We then use the data from the remaining BEST study participants to obtain a nonparametric estimate of the treatment effects, with respect to each of several ordered patient outcomes that encompass both risks and benefits of treatment, for any fixed score. Furthermore, confidence interval and band estimates are constructed to quantify the uncertainty of our inferences for the treatment differences over a range of scores. We indeed detect subsets of patients who experience significant treatment benefits in addition to other patient groups who appear to be poor candidates for treatment.
Brian Claggett, Lu Tian, Lihui Zhao, Davide Castagno, Lee-Jen Wei

Missing Data in Principal Surrogacy Settings

When an outcome of interest in a clinical trial is late-occurring or difficult to obtain, good surrogate markers can reliably extract information about the effect of the treatment on the outcome of interest. Surrogate measures are obtained post-randomization, and thus the surrogate–outcome relationship may be subject to unmeasured confounding. Thus Frangakis and Rubin (Biometrics 58:21–29, 2002) suggested assessing the causal effect of treatment within “principal strata” defined by the counterfactual joint distribution of the surrogate marker under the treatment arms. Li et al. (Biometrics 66:523–531, 2010) elaborated this suggestion for binary markers and outcomes, developing surrogacy measures that have causal interpretations and utilizing a Bayesian approach to accommodate non-identifiability in the model parameters. Here we extend this work to accommodate missing data under ignorable and non-ignorable settings, focusing on latent ignorability assumptions (Frangakis and Rubin, Biometrika 86:365–379, 1999; Peng et al., Biometrics 60:598–607, 2004; Taylor and Zhou, Biometrics 65:88–95, 2009). We also allow for the possibility that missingness has a counterfactual component, one that might differ between the treatment and control due to differential dropout, a feature that previous literature has not addressed.
Michael R. Elliott, Yun Li, Jeremy M. G. Taylor

Assessment of Treatment-Mediated Correlation Between a Clinical Endpoint and a Biomarker

There is increasing need to identify biomarkers (BMKs) responding early to drug treatment to help decision making during clinical development. One of the statistical metrics often involved in screening such BMKs from a single study is the assessment of correlation between a candidate BMK and a primary clinical endpoint. In this chapter, some drawbacks in relying on simple regression models for such an investigation will be criticized first, followed by a real example to demonstrate the danger of relying on static data to assess such a correlation. A theoretical justification will then be given to promote the idea of pursuing treatment-mediated correlation patterns. The rest of this paper will then be focused on how to estimate correlation under this preferred metric from data with parallel-group design and time-to-event (T2E) being the primary clinical endpoint. A jointly modeling framework of T2E and longitudinally measured BMK will then be introduced, with explanation in details how to parameterize the joint model and interpret some key parameters. By comparing the performances of three different models, the results from the analysis of an AIDS trial will be presented to demonstrate the benefit of joint modeling of T2E and BMK, followed by some brief discussions.
Peter H. Hu

Biomarker Selection in Medical Diagnosis

A biomarker is usually used as a diagnostic or assessment tool in medical research. Finding a single ideal biomarker of a high level of both sensitivity and specificity is not an easy task; especially when a high specificity is required for a population screening tool. Combining multiple biomarkers is a promising alternative and can provide a better overall performance than the use of a single biomarker. It is known that the area under the receiver operating characteristic (ROC) curve is most popular for evaluation of a diagnostic tool. In this study, we consider the criterion of the partial area under the ROC curve (pAUC) for the purpose of population screening. Under the binormality assumption, we obtain the optimal linear combination of biomarkers in the sense of maximizing the pAUC with a pre-specified specificity level. Furthermore, statistical testing procedures based on the optimal linear combination are developed to assess the discriminatory power of a biomarker set and an individual biomarker, respectively. Stepwise biomarker selections, by embedding the proposed tests, are introduced to identify those biomarkers of statistical significance among a biomarker set. Rather than for an exploratory study, our methods, providing computationally intensive statistical evidence, are more appropriate for a confirmatory analysis, where the data has been adequately filtered. The applicability of the proposed methods are shown via several real data sets with a moderate number of biomarkers.
Man-Jen Hsu, Yuan-Chin Ivan Chang, Huey-Miin Hsueh

Bayesian Statistics in Clinical Trials


Safety Concerns of the 3+3 Design: A Comparison to the mTPI Design

The 3 + 3 design is the most common choice by clinicians for phase I dose-escalation oncology trials. In recent reviews, more than 90 % of phase I trials are based on the 3 + 3 design (Rogatko et al., Journal of Clinical Oncology 25:4982–4986, 2007). The simplicity and transparency of 3 + 3 allows clinicians to conduct dose escalations in practice with virtually no logistic cost, and trial protocols based on 3 + 3 pass IRB and biostatistics reviews briskly. However, the performance of 3 + 3 has never been compared to model-based designs under simulation studies with matched sample sizes. In the vast majority of statistical literature, 3 + 3 has been shown to be inferior in identifying the true MTD although the sample size required by 3 + 3 is often magnitude smaller than model-based designs. In this paper, through comparative simulation studies with matched sample sizes, we demonstrate that the 3 + 3 design has higher risks of exposing patients to toxic doses above the MTD than the mTPI design (Ji et al., Clinical Trials 7:653–663, 2010), a newly developed adaptive method. In addition, compared to mTPI, 3 + 3 does not provide higher probabilities in identifying the correct MTD even when the sample size is matched. Given the fact that the mTPI design is equally transparent, simple and costless to implement with free software, and more flexible in practical situations, we highly encourage more adoptions of the mTPI design in early dose-escalation studies whenever the 3 + 3 design is also considered. We provide a free software to allow direct comparisons of the 3 + 3 design to other model-based designs in simulation studies with matched sample sizes.
Yuan Ji, Sue-Jane Wang

Bayesian Interim Inference of Probability of Clinical Trial Success

Understanding of the efficacy of an investigated compound in early drug development often relies on assessment of a biomarker or multiple biomarkers that are believed to be correlated with the intended clinical outcome. The biomarker of interest may require enough duration of time to show its satisfactory response to drug effect. Meanwhile, many drug candidates in the portfolio of a pharmaceutical company may compete for the limited resources available. Thus decisions based on assessment of the biomarker after a prolonged duration may be inefficient. One solution is that longitudinal measurements of the biomarker be measured during the expected duration, and analysis be conducted in the middle of the trial, so that the interim measurements may help estimate the measurement at the intended time for interim decision making. Considering the small trial size nature of early drug development and convenience in facilitating interim decisions, we applied Bayesian inference to interim analysis of biomarkers.
Ming-Dauh Wang, Grace Ying Li

Bayesian Survival Analysis Using Log-Linear Median Regression Models

For the analysis of survival data from clinical trials, the popular semiparametric models such as Cox’s (1972) proportional hazards model and linear transformation models (Cheng et al. 1995) usually focus on modeling effects of covariates on the hazard ratio or the survival response. Often, there is substantial information available in the data to make inferences about the median/quantiles. Models based on the median/quantiles (Ying et al. 1995) survival have been shown to be useful in for describing covariate effects. In this paper, we present two novel survival models with log-linear median regression functions. These two wide classes of semiparametric models have many desirable properties including model identifiability, closed form expressions for all quantile functions, and nonmonotone hazards. Our models also have many important practical advantages, including the ease of determination of priors, a simple interpretation of the regression parameters via the ratio of median survival times, and the ability to address heteroscedasticity of survival response. We illustrate the advantages of proposed methods through extensive simulation studies investigating small sample performance and robustness properties compared to competing methods for median regression, which provide further guidance regarding appropriate modeling in clinical trial.
Jianchang Lin, Debajyoti Sinha, Stuart Lipsitz, Adriano Polpo

Bayesian Analysis of Survival Data with Semi-competing Risks and Treatment Switching

Treatment switching is common in clinical trials due to ethical and practical reasons. When the outcome of interest is time to death and patients were switched at the time of intermediate nonterminal event, semi-competing risk issue intertwines with the challenge of treatment switching. In this chapter, we develop a Bayesian conditional model for survival data with semi-competing risks in the presence of partial treatment switching. Properties of the conditional model are examined and an efficient Gibbs sampling algorithm is developed to sample from the posterior distribution. A Bayesian procedure to estimate the marginal survival functions and to assess the treatment effect is also derived. The Deviance Information Criterion with an appropriate deviance function and Logarithm of the Pseudo-marginal Likelihood are constructed for model comparison. The proposed method is examined empirically through a simulation study and is further applied to analyze data from a colorectal cancer study.
Yuanye Zhang, Qingxia Chen, Ming-Hui Chen, Joseph G. Ibrahim, Donglin Zeng, Zhiying Pan, Xiaodong Xue

High Dimensional Data Analysis and Statistical Genetics


High-Dimensional Ordinary Differential Equation Models for Reconstructing Genome-Wide Dynamic Regulatory Networks

The gene regulatory network (GRN) is a complex control system and plays a fundamental role in the physiological and development processes of living cells. Focusing on the ordinary differential equation (ODE) modeling approach, we propose a novel pipeline for constructing high-dimensional dynamic GRNs from genome-wide time course gene expression data. A five-step procedure, i.e., detection of temporally differentially expressed genes, clustering genes into functional modules, identification of network structure, parameter estimate refinement and functional enrichment analysis, is developed, combining a series of cutting-edge statistical techniques to efficiently reduce the dimension of the problem and to account for the correlations between measurements from the same gene. In the key step of identifying the network structure, we employ the advanced parameter estimation and statistical inference methods to perform model selection for the ODE models. The proposed pipeline is a computationally efficient data-driven tool bridging the experimental data and the mathematical modeling and statistical analysis. The application of the pipeline to the time course gene expression data from influenza-infected mouse lungs has led to some interesting findings of the immune process in mice and also illustrated the usefulness of the proposed methods.
Shuang Wu, Zhi-Ping Liu, Xing Qiu, Hulin Wu

Kernel Methods for Regression Analysis of Microbiome Compositional Data

With the development of next generation sequencing technologies, the human microbiome can now be studied using direct DNA sequencing. Many human diseases have been shown to be associated with the disorder of the human microbiome. Previous statistical methods for associating the microbiome composition with an outcome such as disease status focus on the association of the abundance of individual taxon or their abundance ratios with the outcome variable. However, the problem of multiple testing leads to loss of power to detect the association. When individual taxon-level association test fails, an overall test, which pools the individually weak association signal, can be applied to test the significance of the effect of the overall microbiome composition on an outcome variable. In this paper, we propose a kernel-based semi-parametric regression method for testing the significance of the effect of the microbiome composition on a continuous or binary outcome. Our method provides the flexibility to incorporate the phylogenetic information into the kernels as well as the ability to naturally adjust for the covariate effects. We evaluate our methods using simulations as well as a real data set on testing the significance of the human gut microbiome composition on body mass index (BMI) while adjusting for total fat intake. Our result suggests that the gut microbiome has a strong effect on BMI and this effect is independent of total fat intake.
Jun Chen, Hongzhe Li

A Conditional Autoregressive Model for Detecting Natural Selection in Protein-Coding DNA Sequences

Phylogenetics, the study of evolutionary relationships among groups of organisms, has played an important role in modern biological research, such as genomic comparison, detecting orthology and paralogy, estimating divergence times, reconstructing ancient proteins, identifying mutations likely to be associated with disease, determining the identity of new pathogens, and finding the residues that are important to natural selection. Given an alignment of protein-coding DNA sequences, most methods for detecting natural selection rely on estimating the codon-specific nonsynonymous/synonymous rate ratios (d N d S ). Here, we describe an approach to modeling variation in the d N d S by using a conditional autoregressive (CAR) model. The CAR model relaxes the assumption in most contemporary phylogenetic models, i.e., sites in molecular sequences evolve independently. By incorporating the information stored in the Protein Data Bank (PDB) file, the CAR model estimates the d N d S based on the protein three-dimensional structure. We implement the model in a fully Bayesian approach with all parameters of the model considered as random variables and make use of the NVIDIA’s parallel computing architecture (CUDA) to accelerate the calculation. Our result of analyzing an empirical abalone sperm lysine data is in accordance with the previous findings.
Yu Fan, Rui Wu, Ming-Hui Chen, Lynn Kuo, Paul O. Lewis

Dimension Reduction for Tensor Classification

This article develops a sufficient dimension reduction method for high dimensional regression with tensor predictors, which extends the conventional vector-based dimension reduction model. It proposes a tensor dimension reduction model that assumes that a response depends on some low-dimensional representation of tensor predictors through an unspecified link function. A sequential iterative dimension reduction algorithm (SIDRA) that effectively utilizes the tensor structure is proposed to estimate the parameters. The SIDRA generalizes the method in Zhong and Suslick (2012), which proposes an iterative estimation algorithm for matrix classification. Preliminary studies demonstrate that the tensor dimension reduction model is a rich and flexible framework for high dimensional tensor regression, and SIDRA is a powerful and computationally efficient method.
Peng Zeng, Wenxuan Zhong

Successive Standardization: Application to Case-Control Studies

In this note we illustrate the use and applicability of successive standardization (or normalization), studied earlier by some of the same authors (see Olshen and Rajaratnam, Algorithms 5(1):98–112, 2012; Olshen and Rajaratnam, Proceeding of the 1st International Conference on Data Compression, Communication and Processing (CCP 2011), June 21–24, 2011; Olshen and Rajaratnam, Annals of Statistics 38(3):1638–1664, 2010), in the context of biomedical applications. Successive standardization constitutes a type of normalization that is applied to rectangular arrays of numbers. An iteration first begins with operations on rows: first subtract the mean of each row from elements of the particular row; then row elements are divided by their respective row standard deviations. This constitutes half an iteration. These two operations are then applied successively at the level of columns, constituting the other half of the iteration. The four operations together constitute one full iteration. The process is repeated again and again and is referred to as “successive standardization.” Work in Olshen and Rajaratnam, Algorithms 5(1):98–112, 2012; Olshen and Rajaratnam, Proceeding of the 1st International Conference on Data Compression, Communication and Processing (CCP 2011), June 21–24, 2011; Olshen and Rajaratnam, Annals of Statistics 38(3):1638–1664, 2010 is about both theoretical and numerical properties of the successive standardization procedure, including convergence, rates of convergence, and illustrations. In this note, we consider the application of successive standardization to a specific biomedical context, that of case–control studies in cardiovascular biology. We demonstrate that successive standardization is very useful for identifying novel gene therapeutic targets. In particular, we demonstrate that successive standardization identifies genes that otherwise would have been rendered not significant in a Significance Analysis of Microarrays (SAM) study had standardization not been applied.
Bala Rajaratnam, Sang-Yun Oh, Michael T. Tsiang, Richard A. Olshen

Survival Analysis


Quantification of PFS Effect for Accelerated Approval of Oncology Drugs

By the accelerated approval (AA) mechanism (Code of Federal Regulations- 21 CFR 314 and 601. Accelerated Approval Rule, 1992), the FDA may grant approval of drugs or biologic products that are intended to treat serious or life-threatening diseases using a surrogate endpoint that is reasonably likely to predict clinical benefit. In oncology, progression-free-survival (PFS) is increasingly used as such a surrogate of overall survival (OS) in Phase III confirmatory trials. Improved understanding on how to deal with the PFS endpoint in trial conduct and data analysis has mitigated some regulatory concerns about this endpoint. However, a glaring gap still exists as how to determine whether the outcome from a registration trial with PFS as the primary endpoint at the time of analysis is reasonably likely to predict a clinical benefit as normally reflected through an effect on OS. Since there is no guidance on this, regulatory agencies tend to look for a compelling PFS effect coupled with an OS effect in the right direction without specification of the effect sizes and significance levels. To address this issue, we propose a synthesized approach that combines the observed OS effect and the estimated OS effect from the PFS data to explicitly test the implicit OS hypothesis at the time of primary analysis. The proposed approach is applied to hypothetical Phase III trials in metastatic colorectal cancer and adjuvant colon cancer settings using the relationships between OS effect size and PFS effect size established from historical data. Prior information on such a historical relationship is frequently cited by relevant decision makers during regulatory reviews for drug approval. However, the information is rarely fully accounted for in the actual (mostly qualitative) decision-making process. Our approach provides a simple analytic tool for deriving a more quantitative decision. It is clear that the design based on our approach may have a larger sample size than a conventional trial with PFS as the primary endpoint, but directly address the elusive OS question that a conventional PFS trial cannot, no matter how good a surrogate endpoint PFS is.
Cong Chen, Linda Z. Sun

Integrative Analysis of Multiple Cancer Prognosis Datasets Under the Heterogeneity Model

In cancer research, genomic studies have been extensively conducted, searching for markers associated with prognosis. Because of the “large d, small n” characteristic, results generated from the analysis of a single dataset can be unsatisfactory. Integrative analysis simultaneously analyzes multiple datasets and can be more effective than the analysis of single datasets and classic meta-analysis. In many existing integrative analyses, the homogeneity model has been assumed, which postulates that different datasets share the same set of markers. In practice, datasets may have been generated in studies that differ in patient selection criteria, profiling techniques, and many other aspects. Such differences may make the homogeneity model too restricted. Here we explore the heterogeneity model, which assumes that different datasets may have different sets of markers. With multiple cancer prognosis datasets, we adopt the AFT (accelerated failure time) models to describe survival. A weighted least squares approach is adopted for estimation. For marker selection, penalization-based methods are examined. These methods have intuitive formulations and can be computed using effective group coordinate descent algorithms. Analysis of three lung cancer prognosis datasets with gene expression measurements demonstrates the merit of heterogeneity model and proposed methods.
Jin Liu, Jian Huang, Shuangge Ma

Safety and Risk Analysis


On Analysis of Low Incidence Adverse Events in Clinical Trials

In drug or vaccine development, some adverse events (AEs) of interest may occur infrequently. Because of their clinical importance, those AEs may be studied in a clinical trial with large sample size, long-term follow-up, or in meta-analysis of combined data from multiple trials. The conventional summary and analysis methods based on frequency of first occurrence and comparing the proportion difference between treatment groups may not be the best approach because (1) the drug exposure information is not considered in the frequency summary and analysis and (2) any recurrence of an event in the long-term follow-up is not accounted for. When recurrence events are considered, issues on the analysis such as intra-subject correlation among the recurrence events, over-dispersion, and zero inflation may need to be considered. In this paper, we review several approaches for summary and analysis of safety data in these settings. Considerations are given on the assumptions of the risk function, adjustment for differential follow-up, and handling of over-dispersion and excessive zero for low incidence events. Applications to drug and vaccine clinical trials will be used for demonstration.
G. Frank Liu

Statistical Power to Detect Cardiovascular Signals in First-in-Human Trials: Is It Really Small?

It is widely accepted that, due to the small size of first-in-human (FIH) trials, safety signals are difficult to detect. The chances of detecting early signals in cardiovascular safety, including heart rate, blood pressure, QT prolongation, etc., have long been considered to be remote. However, much of this belief is based on an analysis involving pair-wise comparisons of very small cohorts. When dose is considered as a continuous variable, dose–response becomes the main focus and power can be significantly improved with appropriate testing procedures. In this research, we try to quantify through simulations the power in this setting and demonstrate that cardiovascular safety signals in general have reasonable statistical power for early detection when using a dose–response analysis. The simulations account for different magnitudes of effects and various scenarios including linear, log-linear, and Emax relationships between dose and safety signal, together with multiple parametric and nonparametric tests.
Ouhong Wang, Mike Hale, Jing Huang

Longitudinal and Spatial Data Analysis


Constructing Conditional Reference Charts for Grip Strength Measured with Error

Muscular strength, usually quantified through the grip strength, can be used in humans and animals as an indicator of neuromuscular function or to assess hand function in patients with trauma or congenital problems. Because grip strength cannot be accurately measured, several contaminated measurements are often taken on the same subject. A research interest in grip strength studies is estimating the conditional quantiles of the latent grip strength, which can be used to construct conditional grip strength charts. Current work in the literature often applies conventional quantile regression method using the subject-specific average of the repeated measurements as the response variable. We show that this approach suffers from model misspecification and often leads to biased estimates of the conditional quantiles of the latent grip strength. We propose a new semi-nonparametric estimation approach, which is able to account for measurement errors and allows the subject-specific random effects to follow a flexible distribution. We demonstrate through simulation studies that the proposed method leads to consistent and efficient estimates of the conditional quantiles of the latent response variable. The value of the proposed method is assessed by analyzing a grip strength data set on laboratory mice.
Pedro A. Torres, Daowen Zhang, Huixia Judy Wang

Hierarchical Bayesian Analysis of Repeated Binary Data with Missing Covariates

Missing covariates are a common problem in many biomedical and environmental studies. In this chapter, we develop a hierarchical Bayesian method for analyzing data with repeated binary responses over time and time-dependent missing covariates. The fitted model consists of two parts: a generalized linear mixed probit regression model for the repeated binary responses and a joint model to incorporate information from different sources for time-dependent missing covariates. A Gibbs sampling algorithm is developed for carrying out posterior computation. The importance of the covariates is assessed via the deviance information criterion. We revisit the real plant dataset considered by Huang et al. (2008) and use it to illustrate the proposed methodology. The results from the proposed methods are compared with those in Huang et al. (2008). Similar top models and estimates of model parameters are obtained by both methods.
Fang Yu, Ming-Hui Chen, Lan Huang, Gregory J. Anderson

Multi–Regional Clinical Trials


Use of Random Effect Models in the Design and Analysis of Multi-regional Clinical Trials

In recent years, global collaboration has become a commonly used strategy for new drug development. To accelerate the development process and shorten the approval time, the design of multi-regional clinical trials (MRCTs) incorporates subjects from many countries around the world under the same protocol. After showing the overall efficacy of a drug in all global regions, one can also simultaneously evaluate the possibility of applying the overall trial results to all regions and subsequently support drug registration in each of them. Several statistical methods have been proposed for the design and evaluation of MRCTs. Most of these approaches, however, assume a common variability of the primary endpoint across regions. In practice, this assumption may not be true due to differences across regions. In this paper, we use a random effect model for modeling heterogeneous variability across regions for the design and evaluation of MRCTs.
Yuh-Jenn Wu, Te-Sheng Tan, Shein-Chung Chow, Chin-Fu Hsiao

In Vitro Drug Combination Studies


Experimental Design for In Vitro Drug Combination Studies

In vitro drug combination studies typically involve a large number of wells with various concentrations of two drugs added together. To gain the most information from an experiment, what should the drug concentrations be? Here, we consider the case where the single drug response curves are known beforehand, but no previous data is available from the combination. We consider several designs, including C- and D-optimal designs and a factorial design. We evaluate these designs based on the expected variance of the synergy score for a large set of in vitro experiments performed at Takeda Pharmaceuticals. Based on the results, we were able to identify which design was the most efficient and robust.
Gregory Hather, Huaihou Chen, Ray Liu
Weitere Informationen