Skip to main content
main-content

Über dieses Buch

This book presents the proceedings of the 39th annual Midwest Biopharmaceutical Statistics Workshop (MBSW), held in Muncie, Indiana on May 16–18, 2016. It consists of selected peer- reviewed and revised papers on topics ranging from statistical applications in drug discovery and CMC to biomarkers, clinical trials, and statistical programming. All contributions feature original research, and together they cover the full spectrum of pharmaceutical R&D – with a special focus on emergent topics such as biosimilarity, bioequivalence, clinical trial design, and subgroup identification.


Founded in 1978, the MBSW has provided a forum for statisticians to share knowledge, research, and applications on key statistical topics in pharmaceutical R&D for almost forty years, with the 2016 conference theme being “The Power and 3 I’s of Statistics: Innovation, Impact and Integrity.” The papers gathered here will be of interest to all researchers whose work involves the quantitative aspects of pharmaceutical research and development, including pharmaceutical statisticians who want to keep up-to-date with the latest trends, as well as academic statistics researchers looking for areas of application.

Inhaltsverzeichnis

Frontmatter

Specification and Sampling Acceptance Tests

Frontmatter

Statistical Considerations in Setting Quality Specification Limits Using Quality Data

Abstract
According to ICH Q6A (Specifications: test procedures and acceptance criteria for new drug substances and new drug procedures: chemical substances, (1999) [5]) Guidance, a specification is defined as a list of tests, references to analytical procedures, and appropriate acceptance criteria, which are numerical limits, ranges, or other criteria for the tests described. They are usually proposed by the manufacturers, and subject to the regulatory approval for use. When the acceptance criteria in product specifications cannot be pre-defined based on prior knowledge, the conventional approach is to use data of clinical batches collected during the clinical development phases. This interval may be revised with the accumulated data collected from released batches after drug approval. Dong et al. (J Biopharm Stat 25:317–327, 2015 [1]) discussed the statistical properties of the commonly used intervals and made some recommendations. However, in reviewing the proposed intervals, it is often difficult for the regulatory scientists to understand the difference between the intervals, when some intervals require only pre-specified target proportion of the distribution, and others require confidence level, in addition. Therefore, we propose to use the same confidence level of 95%, and calibrate each interval to the true coverage, under the tolerance interval setting. It is easy to show that the predictive interval and reference interval has the variable true coverage, and increases with the sample size, while tolerance interval covers the fixed true coverage. Based on our study results, we propose somesome appropriate statistical methods, in setting product specifications, to better ensure the product quality for the regulation purpose.
Yi Tsong, Tianhua Wang, Xin Hu

Counting Test and Parametric Two One-Sided Tolerance Interval Test for Content Uniformity Using Large Sample Sizes

Abstract
The purpose of uniformity of dosage unit test is to determine the degree of uniformity in the amount of drug substance among dosage units in a batch. Recently, there are several nonparametric methods including the large sample counting approach proposed in European Pharmacopeia 8.1 (EU Option 2). All nonparametric methods specify a maximum number of tablets, of which the contents fall outside the interval (85%, 115%) of labeling claim (LC) for a given large sample size. The nonparametric method in European Pharmacopeia requires another maximum number of tablets, of which the contents fall outside the interval (75%, 125%) LC. We denote the nonparametric method as the counting test which will be used in the rest of the article. We focus on the comparison of the acceptance probabilities between EU Option 2 and the parametric two one-sided tolerance intervals (PTIT_matchUSP90) test. Obviously, a counting test is less efficient than a parametric test in general. Our simulation study clearly shows that the EU Option 2 is not sensitive to batches with a large variability in contents which follow a normal distribution with an off-target mean, a mixture of two normal distributions, or a mixture of a uniform distribution with small percent of extreme values. The EU Option 2 is not sensitive to the mean shift of the majority population (97%) from 100% LC to 90% LC. In addition, the EU Option 2 is not sensitive to low assay values (about 90% LC). The EU Option 2 is over-sensitive to one extreme case: 97% tablets with 100% LC and 3% tablets with 76% LC.
Meiyu Shen, Yi Tsong, Richard Lostritto

Analytical Biosimilar and Process Validation

Frontmatter

Sample Size Consideration for Equivalent Test of Tier-1 Quality Attributes for Analytical Biosimilarity Assessment

Abstract
FDA recommends a stepwise approach for obtaining the totality-of-the-evidence for assessing biosimilarity between a proposed biosimilar product and its corresponding reference biologic product being considered (US Food and Drug Administration.: Guidance for industry: scientific considerations in demonstrating biosimilarity to a reference product. US Food and Drug Administration, Silver Spring, 2015 [6]). The stepwise approach starts with analytical studies for assessing similarity in critical quality attributes (CQAs), which are relevant to clinical outcomes. For critical quality attributes that are most relevant to clinical outcomes (Tier 1 CQAs), FDA requires equivalence testing to be performed for similarity assessment, based on an equivalence acceptance criteria. In practice, the number of Tier 1 CQAs might be greater than one, and should be no more than four. The number of biosimilar lots is often recommended to be no less than 10, and the ratio between the reference product sample size and biosimilar product sample size is recommended within the range from \( 2/3 \) to \( 3/2 \) (US Food and Drug Administration.: Guidance for industry: Statistical Approaches to Evaluate Analytical Similarity. US Food and Drug Administration, Silver Spring, 2017 [7]). Accordingly, we derive the formulas for the power calculation for the sample size for analytical similarity assessment based on the equivalence testing currently used in analytical biosimilar assessment (Tsong et al. J Biopharm Stat 27:197–205, (2017)[10]).
Tianhua Wang, Yi Tsong, Meiyu Shen

A Probability Based Equivalence Test of NIR Versus HPLC Analytical Methods in a Continuous Manufacturing Process Validation Study

Abstract
Continuous manufacturing processes rely on Process Analytical Technology (PAT) and chemometric Near Infrared (NIR) technologies to carry out real time release testing (RTRt). A critical requirement for this purpose is to establish the equivalence between the NIR analytical method with the gold standard analytical method, say an HPLC method. We propose a variance components model that acknowledges the inherent blocking across individual dosage units through a paired comparison. Variance terms corresponding to dosage unit, location effects due to a stratified sampling plan and heterogeneous residual terms provide estimates of the total measurement uncertainty in both methods free of dosage unit effects. Bayesian posterior parameter estimates and the posterior predictive distribution are used to assess the performance of the NIR method in relation to the HPLC gold standard method as a measure of equivalence, referred to as a Relative Performance Index (Rel_Pfm). An acceptably high probability of a Rel_Pfm of 1 (or greater) is proposed as the essential requirement for establishing equivalence (or superiority).
Areti Manola, Steven Novick, Jyh-Ming Shoung, Stan Altan

A Further Look at the Current Equivalence Test for Analytical Similarity Assessment

Abstract
Establishing analytical similarity is the foundation of biosimilar product development. Although there is no guidance on how to evaluate analytical data for similarity, the US Food and Drug Administration (FDA) recently suggested an equivalence test on the mean difference between innovator and the biosimilar product as the statistical similarity assessment for Tier 1 quality attributes (QAs), defined as the QAs that are directly related to the mechanism of action. However, the mathematical derivation and simulation work presented in this paper shows that the type I error is typically increased in most realistic settings when an estimate of sigma is used for the equivalence margin. This error cannot be improved by increasing sample size. The impacts of the constant c on type I error and sample size adjustment in the imbalanced situation are discussed, as well.
Neal Thomas, Aili Cheng

Shiny Tools for Sample Size Calculation in Process Performance Qualification of Large Molecules

Abstract
The regulatory guidance documents on process validation have been recently revised to emphasize the three-stage lifecycle approach throughout validation. As an important milestone within Stage 2: process qualification, the process performance qualification (PPQ) requires taking adequate samples to provide sufficient statistical confidence of quality both within a batch and between batches. To help meet the PPQ requirements and to further support continued process verification for large molecules, for continuous critical quality attributes, Shiny tools have been developed to calculate the minimum numbers of samples within batches to control the batch-specific beta-content tolerance intervals within prespecified acceptance ranges. The tolerance intervals at attribute level are also displayed to assure the suitability of the predefined number of PPQ batches. In addition, another Shiny application for creation and evaluation of the sampling plans for binary attributes will be illustrated in terms of failure rates of future batches and consumer’s and producer’s risk probabilities. The tools for both continuous and binary attributes allow to adjust the sampling plans based on historical data, and are designed with interactive features including dynamic inputs, outputs and visualization.
Qianqiu Li, Bill Pikounis

Continuous Process

Frontmatter

Risk Evaluation of Registered Specifications and Internal Release Limits Using a Bayesian Approach

Abstract
This article proposes to pursue advanced statistical approaches to quantify risks systematically through a product lifecycle for sound decision making. The work focuses on registered specifications and internal release limits as these are important elements in pharmaceutical development, manufacturing, and supply to ensure product safety, efficacy, and quality. Bayesian inference is explored as a potential valuable approach to enhance risk assessment and related decision making. A Bayesian approach is utilized to predict risks of batch failure and poor process capability associated with registered specifications and internal release limits, leading to a more effective specification setting process. The benefits are demonstrated using a real-life case.
Yijie Dong, Tianhua Wang

Development of Statistical Computational Tools Through Pharmaceutical Drug Development and Manufacturing Life Cycle

Abstract
Statisticians at Pfizer who support Chemistry, Manufacturing, and Controls (CMC), and Regulatory Affairs (Reg CMC) have developed many statistical R-based computational tools to enable high efficiency, consistency, and fast turnaround in their routine statistical support to drug product and manufacturing process development. Most tools have evolved into web-based applications for convenient access by statisticians and colleagues across the company. These tools cover a wide range of areas, such as product stability and shelf life or clinical use period estimation, process parameter criticality assessment, and design space exploration through experimental design and parametric bootstrapping. In this article, the general components of these R-programmed web-based computational tools are introduced, and their successful applications are demonstrated through an application of estimating a drug product shelf life based on stability data.
Fasheng Li, Ke Wang

Application of Advanced Statistical Tools to Achieve Continuous Analytical Verification: A Risk Assessment Case of the Impact of Analytical Method Performance on Process Performance Using a Bayesian Approach

Abstract
The criticalness of robust analytical performance is becoming more and more recognized in the pharmaceutical industry. An effective analytical control strategy needs to be defined, along with a process control strategy, to ensure that the measurement uncertainties are controlled to achieve the intended purposes of analytical methods. The principles of Continuous Process Verification (CPV) have been applied to the lifecycle management of analytical robustness, which leads to our vision of Continuous Analytical Verification (CAV) through a product lifecycle. This work proposes to apply advanced statistical tools to deliver on the vision of CAV. A Bayesian hierarchical modeling approach is a potential solution to integrate a risk-based control strategy into the framework of CAV from design, qualification, to continued verification. A case study is included to illustrate the benefits of a Bayesian-based systematic tool in assessing the impact of analytical performance on process performance and in informing decisions related to analytical control strategy, in order to ensure analytical and process robustness.
Iris Yan, Yijie Dong

Clinical Trial Design and Analysis

Frontmatter

Exact Inference for Adaptive Group Sequential Designs

Abstract
In this paper we present a method for estimating the treatment effect in a two-arm adaptive group sequential clinical trial that permits sample size re-estimation, alterations to the number and spacing of the interim looks, and changes to the error spending function based on an unblinded look at the accruing data. The method produces a median unbiased point estimate and a confidence interval having exact coverage of the parameter of interest. The procedure is based on mapping the final test statistic obtained in the modified trial into a corresponding backward image in the original trial. Methods that were developed for classical (non-adaptive) group sequential inference can then be applied to the backward image.
Cyrus Mehta, Lingyun Liu, Pranab Ghosh, Ping Gao

A Novel Framework for Bayesian Response-Adaptive Randomization

Abstract
The development of response-adaptive randomization (RAR) has taken many different paths over the past few decades. Some RAR schemes optimize certain criteria, but may be complicated and often rely on asymptotic arguments, which may not be suitable in trials with small sample sizes. Some Bayesian RAR schemes are very intuitive and easy to implement, but may not always be tailored toward the study goals. To bridge the gap between these methods, we proposed a framework in which easy-to-implement Bayesian RAR schemes can be derived to target the study goals. We showed that the popular Bayesian RAR scheme that assigns more patients to better performing arms fits in the new framework given a specific intention. We also illustrated the new framework in the setting where multiple treatment arms are compared to a concurrent control arm. Through simulation, we demonstrated that the RAR schemes developed under the new framework outperform a popular method in achieving the pre-specified study goals.
Jian Zhu, Ina Jazić, Yi Liu

Sample Size Determination Under Non-proportional Hazards

Abstract
The proportional hazards assumption rarely holds in clinical trials of cancer immunotherapy. Specifically, delayed separation of the Kaplan-Meier survival curves and long-term survival have been observed. Routine practice in designing a randomized controlled two-arm clinical trial with a time-to-event endpoint assumes proportional hazards. If this assumption is violated, traditional methods could inaccurately estimate statistical power and study duration. This article addresses how to determine the sample size in the presence of nonproportional hazards (NPH) due to delayed separation, diminishing effects, etc. Simulations were performed to illustrate the relationship between power and the number of patients/events for different types of nonproportional hazards. Novel efficient algorithms are proposed to optimize the selection of a cost-effective sample size.
Miao Yang, Zhaowei Hua, Saran Vardhanabhuti

Adaptive Three-Stage Clinical Trial Design for a Binary Endpoint in the Rare Disease Setting

Abstract
A fundamental challenge in developing therapeutic agents for rare diseases is the limited number of eligible patients. A conventional randomized clinical trial may not be adequately powered if the sample size is small and asymptotic assumptions needed to apply common test statistics are violated. This paper proposes an adaptive three-stage clinical trial design for a binary endpoint in the rare disease setting. It presents an exact unconditional test statistic to generally control Type I error when sample size is small while not sacrificing power. Adaptive randomization has the potential to increase power by allocating greater numbers of patients to a more effective treatment. Performance of the method is illustrated using simulation studies.
Lingrui Gan, Zhaowei Hua

Biomarker-Driven Trial Design

Frontmatter

Clinical Trial Designs to Evaluate Predictive Biomarkers: What’s Being Estimated?

Abstract
Predictive biomarkers are used to predict whether a patient is likely to receive benefits from a therapy that outweigh its risks. In practice, a predictive biomarker is measured with a diagnostic assay or test kit. Usually the test has some potential for measuring the biomarker with error. For qualitative tests indicating presence or absence of a biomarker, the probability of misclassification is usually not zero. Study designs to evaluate predictive biomarkers include the biomarker-stratified design, the biomarker-strategy design, the enrichment (or targeted) design, and the discordant risk randomization design. Many authors have reviewed the main strengths and weaknesses of these study designs. However, the estimand being used to evaluate the performance of the predictive biomarker is usually not provided explicitly. In this chapter, we provide explicit formulas for the estimands used in common study designs assuming that the misclassification error of the biomarker test is non-differential to outcome. The estimands are expressed as terms of the biomarker’s predictive capacity (differential in treatment effect between biomarker positive and negative patients when the biomarker is never misclassified) and the test’s predictive accuracy (e.g., positive and negative predictive values of the test for the biomarker). Upon inspection, the estimands reveal not only well-known strengths and weaknesses of the study designs, but other insights. In particular, for the biomarker-stratified design, the estimand is the product of the biomarker predictive capacity and an attenuation factor between 0 and 1 that increases with the test’s predictive accuracy. For other designs, the estimands illuminate important limitations in evaluating the clinical utility of the biomarker test. After presenting the theoretical estimands, we present and discuss estimand values for a hypothetical case study of Procalcitonin (PCT) as a biomarker in Procalcitonin-guided evaluation and management of subjects suspected of lower respiratory tract infection.
Gene Pennello, Jingjing Ye

Biomarker Enrichment Design Considerations in Oncology Single Arm Studies

Abstract
Oncology drug development has been increasingly shaped by molecularly targeted agents (MTAs), which often demonstrate differential effectiveness driven by the biomarker expression levels on tumors. Innovative statistical designs have been proposed to tackle this challenge, e.g., Freidlin et al. [3, 4], Jiang et al. [7]. All of these are essentially adaptive confirmatory Phase 3 designs that combine the testing of treatment effectiveness in the overall population with an alternative pathway for a more restrictive efficacy claim in a sensitive subpopulation. We believe that, in cases that there are strong biological rationales to support that a MTA may provide differential benefit in a general patient population; proof-of-concept (POC) is likely intertwined with predictive enrichment. Therefore, it is imperative that early phase POC studies be designed to specifically address biomarker-related questions to improve the efficiency of development. In this paper, we propose three strategies for detecting efficacy signals in single-arm studies that allow claiming statistical significance either in the overall population or in a biomarker enriched subpopulation. None of the three methods requires pre-specification of biomarker thresholds, but still maintains statistical rigor in the presence of multiplicity. The performance of these proposed methods are evaluated with simulation studies.
Hong Tian, Kevin Liu

Challenges of Bridging Studies in Biomarker Driven Clinical Trials: The Impact of Companion Diagnostic Device Performance on Clinical Efficacy

Abstract
Personalized medicine involves the co-development of both the therapeutic agent (Rx) and a companion diagnostic device (CDx), which directs a group of patients to a particular treatment. There are instances, however, when there are competing, or multiple CDx products for a given Rx. Drivers for multiple CDx products can be driven by improved efficiency, cost, novel technologies, or updated techniques over time. In these instances, concordance between the old assay (e.g., the assay used in the clinical trial or comparator companion diagnostic device in this paper) and a new assay (follow-on companion diagnostic device) needs to be assessed. Discrepancies between the old and new assays, and specifically the impact of discordance on clinical efficacy, need to be evaluated. Studies that establish similarity between two or more CDx products are called bridging studies. We provide a statistical framework for method comparison studies where there is bias in measurement of one or both assessments. We then present a simulation study to evaluate the statistical impact of an imperfect CDx on the sensitivity and specificity of the follow-on companion diagnostic device. Further, we demonstrate the influence of the CDx accuracy on clinical efficacy in the context of an enrichment clinical trial.
Szu-Yu Tang, Bonnie LaFleur

Application of Novel Data Modality

Frontmatter

Parallel-Tempered Feature Allocation for Large-Scale Tumor Heterogeneity with Deep Sequencing Data

Abstract
We developed a parallel-tempered feature allocation algorithm to infer tumor heterogeneity from deep DNA sequencing data. The feature allocation model is based on a binomial likelihood and an Indian Buffet process prior on the latent haplotypes. A variation of parallel tempering technique is introduced to flatten peaked local modes of the posterior distribution, and yields a more efficient Markov chain Monte Carlo algorithm. Simulation studies provide empirical evidence that the proposed method is superior to competing methods at a high read depth. In our application to Glioblastoma multiforme data, we found several distinctive haplotypes that indicate the presence of multiple subclones in the tumor sample.
Yang Ni, Peter Müller, Max Shpak, Yuan Ji

Analysis of T-Cell Immune Responses as Measured by Intracellular Cytokine Staining with Application to Vaccine Clinical Trials

Abstract
Recent advances in single-cell technologies, in particular intracellular cytokine staining (ICS), have enabled multidimensional functional measurements of naturally occurring or vaccine-induced T-cell responses in clinical studies. Analysis of such increasingly multidimensional datasets presents a great challenge to statisticians. Currently, multidimensional functional cell measures are largely analyzed, either by univariate analysis of all combinations of functions individually, or by summarizing a few particular groups of functions separately. Such simple analyses do not reflect comprehensively the polyfunctional profile of the T-cell responses, nor do they allow more sophisticated statistical analysis and inference. In this paper, we introduce a new approach to statistical inference for multidimensional ICS data. We propose to reduce the dimensionality by using a weighted sum, followed by computing the minimum and maximum of the test statistic over all eligible assignments of weights which satisfy the underlying partial ordering of the data. The computation technique is presented. Statistical inference is then based on the minimum and maximum of the test statistic. We illustrate, through an example, that the technique can be useful in reducing the complexity of the multidimensional response data and providing insightful reporting of the results.
Yunzhi Lin, Cong Han

Project Data Sphere and the Applications of Historical Patient Level Clinical Trial Data in Oncology Drug Development

Abstract
As scientific data sharing initiatives become more popular, an increasing amount of oncology clinical trial data is becoming available to the public. This historical data has the potential to help improve the design and analysis of future studies of new oncology compounds. Project Data Sphere is one such public database of oncology studies, with patient level data from over 76,000 patients. Here, we review the contents of this database and describe several examples of how the data has been used or could potentially be used in drug development. Applications include population selection, historical comparisons, and identification of stratification factors.
Greg Hather, Ray Liu

Novel Test for the Equality of Continuous Curves with Homoscedastic or Heteroscedastic Measurement Errors

Abstract
Testing equality of two curves occurs often in functional data analysis. In this paper, we develop procedures for testing if two curves measured with either homoscedastic or heteroscedastic errors are equal. The method is applicable to a general class of curves. Compared with existing tests, ours does not require repeated measurements to obtain the variances at each of the explanatory values. Instead, our test calculates the overall variances by pooling all of the data points. The null distribution of the test statistic is derived and an approximation formula to calculate the p value is developed when the heteroscedastic variances are either known or unknown. Simulations are conducted to show that this procedure works well in the finite sample situation. Comparisons with other test procedures are made based on simulated data sets. Applications to our motivating example from an environmental study will be illustrated. An R package was created for ease of general applications.
Zhongfa Zhang, Yarong Yang, Jiayang Sun

Quality Control Metrics for Extraction-Free Targeted RNA-Seq Under a Compositional Framework

Abstract
The rapid rise in the use of RNA sequencing technology (RNA-seq) for scientific discovery has led to its consideration as a clinical diagnostic tool. However, as a new technology the analytical accuracy and reproducibility of RNA-seq must be established before it can realize its full clinical utility (SEQC/MAQC-III Consortium, 2014; VanKeuren-Jensen et al. 2014). We respond to the need for reliable diagnostics, quality control metrics and improved reproducibility of RNA-seq data by recognizing and capitalizing on the relative frequency nature of RNA-Seq data. Problems with sample quality, library preparation, or sequencing may result in a low number of reads allocated to a given sample within a sequencing run. We propose a method, based on outlier detection of Centered Log-Ratio (CLR) transformed counts, for objectively identifying problematic samples based on the total number of reads allocated to the sample. Normalization and standardization methods for RNA-Seq generally assume that the total number of reads assigned to a sample does not affect the observed relative frequencies of probes within an assay. This assumpion, known as Compositional Invariance, is an important property for RNA-Seq data which enables the comparison of samples with differing read depths. Violations of the invariance property can lead to spurious differential expression results, even after normalization. We develop a diagnostic method to identify violations of the Compositional Invariance property. Batch effects arising from differing laboratory conditions or operator differences have been identified as a problem in high-throughput measurement systems (Leek et al. in Genome Biol 15, R29 [14]; Chen et al. in PLoS One 6 [10]). Batch effects are typically identified with a hierarchical clustering (HC) method or principal components analysis (PCA). For both methods, the multivariate distance between the samples is visualized, either in a biplot for PCA or a dendrogram for HC, to check for the existence of clusters of samples related to batch. We show that CLR transformed RNA-Seq data is appropriate for evaluation in a PCA biplot and improves batch effect detection over current methods. As RNA-Seq makes the transition from the research laboratory to the clinic there is a need for robust quality control metrics. The realization that RNA-Seq data are compositional opens the door to the existing body of theory and methods developed by Aitchison (The statistical analysis of compositional data, Chapman & Hall Ltd., 1986) and others. We show that the properties of compositional data can be leveraged to develop new metrics and improve existing methods.
Dominic LaRoche, Dean Billheimer, Kurt Michels, Bonnie LaFleur

Omics Data Analysis

Frontmatter

Leveraging Omics Biomarker Data in Drug Development: With a GWAS Case Study

Abstract
Biomarkers have proven powerful for target identification, understanding disease progression, drug safety and treatment responses in drug development. Recent development of omics technology has offered great opportunities for identifications of omics biomarkers at low cost. Although biomarkers have brought many promises to drug development, steep challenges arise due to high dimensionality of data, complexity of technology and lack of full understanding of biology. In this article, the application of omics data in drug development will be reviewed. A genome wide association study (GWAS) will be presented.
Weidong Zhang

A Simulation Study Comparing SNP Based Prediction Models of Drug Response

Abstract
Lack of replication on findings and missing heritability are two of the major challenges in Pharmacogenetics (PGx) studies. Recently developed statistical methods for genome-wide association studies offer greater power both to identify relevant genetic markers and to predict drug response or phenotype based on these markers. However, the relative performance of these methods has not been thoroughly studied. Here, we present several simulations to compare the performance of these analysis methods. In our first simulation, we compared five different approaches: Elastic Net (EN), Genome-wide Association Study (GWAS)+EN, Principal Component Regression (PCR), Random Forest (RF) and Support Vector Machine (SVM). The results showed that EN has the smallest test mean squared error (MSE) and the highest portion of causal SNPs among identified SNPs. In the second simulation, we compared three approaches, GWAS+EN, GWAS+RF and GWAS+SVM. The GWAS+RF has the smallest test MSE and the highest causal percent. In the third simulation study, we compared two cross validation procedures: GWAS+EN versus modified learn and confirm cross validation GWAS+EN. The latter approach demonstrated better prediction accuracy at the expense of greatly increased computational time.
Wencan Zhang, Pingye Zhang, Feng Gao, Yonghong Zhu, Ray Liu
Weitere Informationen

Premium Partner

    Bildnachweise