Abstract
With high-throughput technologies now widely available, investigators can easily measure thousands of phenotypes for quantitative trait loci (QTL) mapping. Microarray measurements are particularly amenable to QTL mapping, as evidenced by a number of recent studies demonstrating utility across a broad range of biological endeavors. The early success stories have impelled a rapid increase in both the number and complexity of expression QTL (eQTL) experiments. Consequently, there is a need to consider the statistical principles involved in the design and analysis of these experiments and the methods currently being used. In this article we review these principles and methods and discuss the open questions most likely to yield significant progress toward increasing the amount of meaningful information obtained from eQTL mapping experiments.
Similar content being viewed by others
References
Barry WT, Nobel AB, Wright FA (2005) Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21, 1943–1949
Bing N, Hoeschele I (2005) Genetical genomics analysis of a yeast segregant population for transcription network inference. Genetics 170:533–542
Black MA, Doerge RW (2002) Calculation of the minimum number of replicate spots required for detection of significant gene expression fold change in microarray experiments. Bioinformatics 18:1609–1616
Brem RB, Kruglyak L (2005) The landscape of genetic complexity across 5700 gene expression traits in yeast. Proceedings of the National Academy of Sciences 102:1572–1577
Brem RB, Yvert G, Clinton R, Kruglyak L (2002) Genetic dissection of transcriptional regulation in budding yeast. Science 296:752–755
Bystrykh L, Weersing E, Dontje B, Sutton S, Pletcher MT, et al. (2005) Uncovering regulatory pathways that affect hematopoietic stem cell function using “genetical genomics.” Nat Genet 37:225–232
Chesler EJ, Lu L, Shou S, Qu Y, Gu J, et al. (2005) Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat Genet 37:233–242
Churchill GA (2002) Fundamentals of experimental design for cDNA microarrays. Nat Genet 32:490–495
Cui X, Churchill GA (2003) How many mice and how many arrays? Replication in mouse cDNA microarray experiments In: Methods of Microarray Data Analysis III, Johnson KF, Lin SM (eds.) (Norwell MA: Kluwer Academic Publishers) pp 139–154
Dobbin K, Simon R (2005) Sample size determination in microarray experiments for class comparison and prognostic classification. Biostatistics 6(1):27–38
Dobbin K, Shih JH, Simon R (2003a) Statistical design of reverse dye microarrays. Bioinformatics 19(7):803–810
Dobbin K, Shih JH, Simon R (2003b) Questions and answers on design of dual-label microarrays for identifying differentially expressed genes. J Natl Cancer Inst 95(18):1362–1369
Dombkowski AA, Thibodeau BJ, Starcevic SL, Novak RF (2004) Gene-specific dye bias in microarray reference designs. FEBS Lett 560:120–124
Dupuis J, Siegmund D (1999) Statistical methods for mapping quantitative trait loci from a dense set of markers. Genetics 151:373–386
Efron B (2005) Local False Discovery Rates. Available at http://www-stanford.edu/∼brad/papers/. Last accessed April 21 2006
Eisen MB, Spellman PT, Brown PO, Botstein D (1998) Cluster analysis and display of genome-wide expression patterns. Proc Natl Acad Sci 95:14863–14868
Gadbury GL, Page GP, Edwards JW, Kayo T, Prolla TA, et al. (2004) Power and sample size estimation in high dimensional biology. Stat Methods Med Res 13:325–338
Gentleman R (2005) Using GO for Statistical Analyses, Bioconductor vignette http://www.bioconductor.org
Hubner N, Wallace CA, Zimdahl H, Petretto E, Schulz H, et al. (2005) Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat Genet 37:243–253
Hu J, Zou F, Wright FA (2005) Practical FDR-based sample size calculations in microarray experiments. Bioinformatics 21(15):3264–3272
Irizarry RA, Warren D, Spencer F, Kim IF, Biswal S, et al. (2005) Multiple-laboratory comparison of microarray platforms. Nat Methods 2:345–350
Jannink JL (2005) Selective phenotyping to accurately map quantitative trait loci. Crop Sci 45:901–908
Jansen RC, Nap JP (2001) Genetical genomics: the added value from segregation. Trends Genet 17:388–391
Jensen FV (2001) Bayesian Network and Decision Graphs. In Statistics for Engineering and Information Science (New York: Springer-Verlag)
Jeong H, Tombor B, Albert R, Oltvai ZN, Barabasi AL (2000) The large scale organization of metabolic networks. Nature 407:651–653
Jin C, Lan H, Attie AD, Bulutuglo D, Churchill GA, et al. (2004) Selective phenotyping for increased efficiency in genetic mapping studies. Genetics 168:2285-2293
Jung S-H, Bang H, Young S (2005a) Sample size calculation for multiple testing in microarray data analysis. Biostatistics 6(1):157–169
Jung S-H (2005b) Sample size for FDR-control in microarray data analysis. Bioinformatics 21(14):3097–3104
Kendziorski C, Zhang Y, Lan H, Attie AD (2003) The efficiency of mRNA pooling in microarray experiments. Biostatistics 4:465–477
Kendziorski C, Irizarry RA, Chen K, Haag JD, Gould MN (2005) On the utility of pooling biological samples in microarray experiments. Proc Natl Acad Sci USA 102(12):4252–4257
Kendziorski C, Chen M, Yuan M, Lan H, Attie AD (2006) Statistical methods for expression quantitative trait loci (eQTL) mapping. Biometrics 62:19-27
Kerr K (2003) Design considerations for efficient and effective microarray studies. Biometrics 59(4):822–828
Kerr K, Churchill GA (2001) Experimental design for gene expression microarrays. Biostatistics 2:183–201
Lan H, Chen M, Flowers JB, Yandell BS, Stapleton DS, et al. (2006) Combined expression trait correlations and expression quantitative trait locus mapping. PLoS Genet 2:e6
Larget B, Simon D (1999) Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees. Mol Biol Evol 16:750–759
Larkin JE, Frank BC, Gavras H, Sultana R, Quackenbush J (2005) Independence and reproducibility across microarray platforms. Nat Methods 2:337–344
Lee MT, Whitmore GA (2002) Power and sample size for DNA microarray studies. Stat Med 21:3543–3570
Li H, Lu L, Manly KF, Chesler EJ, Bao L, et al. (2005a) Inferring gene transcriptional modulatory relations: a genetical genomics approach. Hum Mol Genet 14(9):1119–1125
Li L, Alderson D, Doyle JC, Willinger W (2005b) Towards a theory of scale-free graphs: definition, properties, and implications. Internet Mathematics 2(4), 431–523
Liu Y, Zeng ZB (2000) A general mixture model approach for mapping quantitative trait loci from diverse cross designs involving multiple inbred lines. Genet Res 75:345–355
Mehrabian M, Allayee H, Stockton J, Lum PY, Drake TA, et al. (2005) Integrating genotypic and expression data in a segregating mouse population to identity 5-lipoxygenase as a susceptibility gene for obesity and bone traits. Nat Genet 37, 1224–1233
Morley M, Molony CM, Weber TM, Devlin JL, Ewens KG, et al. (2004) Genetic analysis of genome-wide variation in human gene expression. Nature 430:743–747
Muller P, Parmigiani G, Robert C, Rousseau J (2004) Optimal sample size for multiple testing: the case of gene expression microarrays. J Am Stat Assoc 99:990–1001
Pan W, Lin J, Le CT (2002) How many replicates of arrays are required to detect gene expression changes in microarray experiments? A mixture model approach. Genome Biol 3(5), research0022
Perez-Enciso M (2004) In silico study of transcriptome genetic variation in outbred populations. Genetics 166:547–554
R Development Core Team (2004) R: A language and environment for statistical computing (Vienna, Austria: R Foundation for Statistical Computing)
Ruschhaupt M, Huber W, Poustka A, Mansmann U (2004) A compendium to ensure computational reproducibility in high-dimensional classification tasks. Statistical Applications in Genetics and Molecular Biology 3(1), article 37
Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, et al. (2003) Genetics of gene expression surveyed in maize, mouse and man. Nature 422:297–302
Schadt EE, Lamb J, Yang X, Zhu J, Edwards S, et al. (2005) An integrative genomics approach to infer causal associations between gene expression and disease. Nat Genet 37:710–717
Sen S, Satagopan J, Churchill GA (2005) QTL study design from an information perspective. Genetics 170:447–464
Simon RM, Dobbin K (2003) Experimental design of DNA microarray experiments. BioTechniques Suppl, 16–21
Storey JD, Tibshirani R (2003) Statistical significance for genomewide studies. Proc Natl Acad Sci USA 100(16):9440–9445
Storey JD, Akey JM, Kruglyak L (2005) Multiple locus linkage analysis of genomewide expression in yeast. PLoS Biol 3(8):e267
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL, et al. (2005). Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. Proc Natl Acad Sci 102:15545–15550
Weis BK, Members of the Toxicogenomics Research Consortium (2005) Standardizing global gene expression analysis between laboratories and across platforms. Nat Methods 2(5):351–356
Yang YH, Speed TP (2002) Design issues for cDNA microarray experiments. Nat Rev Genet 3:579–588
Yvert G, Brem RB, Whittle J, Akey JM, Foss E, et al. (2003) Trans-acting regulatory variation in Saccharomyces cerevisiae and the role of transcription factors. Nat Genet 35:57–64
Zhu J, Lum PY, Lamb J, GuhaThakurta D, Edwards SW, et al. (2004). An integrative genomics approach to the reconstruction of gene networks in segregating populations Cytogenet Genome Res 105:363–374
Acknowledgments
The authors thank Alan Attie, Meng Chen, Michael Newton, and Brian Yandell for useful discussions and two anonymous reviewers for comments that improved the manuscript. They also thank Stephanie Ciatti for extra help at home.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Kendziorski, C., Wang, P. A review of statistical methods for expression quantitative trait loci mapping. Mamm Genome 17, 509–517 (2006). https://doi.org/10.1007/s00335-005-0189-6
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00335-005-0189-6