Skip to main content

2011 | Buch

The Fundamentals of Modern Statistical Genetics

insite
SUCHEN

Über dieses Buch

This book covers the statistical models and methods that are used to understand human genetics, following the historical and recent developments of human genetics. Starting with Mendel’s first experiments to genome-wide association studies, the book describes how genetic information can be incorporated into statistical models to discover disease genes. All commonly used approaches in statistical genetics (e.g. aggregation analysis, segregation, linkage analysis, etc), are used, but the focus of the book is modern approaches to association analysis. Numerous examples illustrate key points throughout the text, both of Mendelian and complex genetic disorders. The intended audience is statisticians, biostatisticians, epidemiologists and quantitatively- oriented geneticists and health scientists wanting to learn about statistical methods for genetic analysis, whether to better analyze genetic data, or to pursue research in methodology. A background in intermediate level statistical methods is required. The authors include few mathematical derivations, and the exercises provide problems for students with a broad range of skill levels. No background in genetics is assumed.

Inhaltsverzeichnis

Frontmatter
Chapter 1. Introduction to Statistical Genetics and Background in Molecular Genetics
Abstract
An understanding of the basic ideas of inheritance has been evident throughout the history of mankind, ever since the domestication of animals or the practice of farming began. The Babylonians and ancient Egyptians utilized cross pollination of crops and selection of domesticated animals for breeding, but did not develop a formal theory for the principles underlying the inheritance of traits. Later, ancient Greek philosophers developed elementary theories to explain how inheritance worked in humans, grappling unsuccessfully with the apparent paradox that inherited characteristics can sometimes differ between offspring and parents. Some diseases in humans, such as sickle cell anemia and hemophilia, have been recognized as inherited disorders for centuries and, as the science of medicine developed, so too did the recognition that many diseases are heritable.
Nan M. Laird, Christoph Lange
Chapter 2. Principles of Inheritance: Mendel’s Laws and Genetic Models
Abstract
It is difficult to overstate the impact of Mendel’s research on the history of genetics; indeed, his research in genetics has been credited as one of the great experimental advances in biology (Fisher, 1965). Prior to the publication of his results on experimental hybridization in plants, the concept of inheritance of physical ‘units’ (later called genes) was accepted, and scientists had reported on many hybridization experiments in both animals and plants. Yet no one had set forth principles of inheritance which could be used as a universal theory to explain how traits in offspring can be predicted from traits in the parents. Mendel provided an explicit rule for how the genotypes of the offspring can be predicted from the genotypes of their parents, and he also established models for how genotypes were related to traits.
Nan M. Laird, Christoph Lange
Chapter 3. Some Basic Concepts from Population Genetics
Abstract
The study of allele frequencies and how they vary over time and over geographic regions has led to many discoveries concerning evolutionary history, migration, gene flow, and the correlation between allele frequencies and disease rates across populations. This chapter covers only a few concepts from population genetics, emphasizing those most relevant to gene mapping: allele frequency estimation, population substructure, Hardy-Weinberg Equilibrium (HWE) and Disequilibrium (HWD), which are frequently used in the analysis of genetic data. Other concepts, e.g., Linkage Disequilibrium and Linkage Equilibrium, will be introduced in later chapters as the need arises.
Nan M. Laird, Christoph Lange
Chapter 4. Aggregation, Heritability and Segregation Analysis: Modeling Genetic Inheritance Without Genetic Data
Abstract
Aggregation and heritability analyses are designed to show that diseases, or phenotypes more generally, have a genetic basis by investigating patterns of phenotypic correlation between relatives; segregation analysis is used to find support for a specific genetic model underlying the inheritance patterns observed in families. They all involve modeling phenotypic data on families, or pedigrees, without using any genetic data. As such, all were developed during the time when genotyping was expensive, labor intensive, and not widely available. Today, the general concepts used in aggregation and heritability analysis are widely accepted as useful measures of the degree to which traits are inherited; most researchers would not undertake genetic analysis without evidence of aggregation or heritability of the trait. Using segregation analysis to determine the model of inheritance at the disease locus was essential in planning parametric linkage analyses, as described in Chapter 6, but the current popularity of non-parametric linkage analysis and association analysis has put segregation analysis somewhat on the sideline. Although this chapter can be skipped if the reader’s primary interest is association, our coverage of these methods is brief and the concepts are useful to anyone with an interest in statistical genetics. In particular, the approach used to construct a likelihood for pedigree data given in Section 4.1 serves as a basis for other analyses in linkage and association discussed in later chapters.
Nan M. Laird, Christoph Lange
Chapter 5. The General Concepts of Gene Mapping: Linkage, Association, Linkage Disequilibrium and Marker Maps
Abstract
In the absence of genetic data at the molecular level, the results of heritability, aggregation and/or segregation analysis provided the first hints about the presence of genetic effects and, consequently, the existence of a disease gene. Without information on the etiology of the disease or gene functionality, the next natural question is: ‘Where is the disease locus located in the genome?’ Although disease genes have now been located for most very rare Mendelian disorders, the search for the genomic location of disease genes for complex disorders has proven to be a difficult task.
Nan M. Laird, Christoph Lange
Chapter 6. Basic Concepts of Linkage Analysis
Abstract
The goal of linkage analysis in human disease gene mapping is to assess whether an observed genetic marker locus is physically linked to the disease locus. This is equivalent to testing the null-hypothesis that the recombination fraction between the marker locus and the disease locus, θ, equals ½. In this case, we say the marker locus and the disease locus are unlinked. It is also possible to estimate θ, which can be used to provide an approximate idea of the location of the DSL relative to observed markers. In this chapter, we discuss the basic concepts of parametric linkage analysis. We explain how linkage between two genetic loci can be utilized to construct long-range mapping approaches that require only a small number of marker loci per chromosome to cover the entire human genome sufficiently. Using fully parameterized statistical models, parametric linkage describes the phenotype as a function of the genetic marker locus and its relative distance to the disease locus, i.e., the recombination fraction (Ott (1999)). The simplest case of parametric linkage analysis uses the method of direct counting, where θ can be estimated by directly counting recombinant and non-recombinant offspring haplotypes (Ott (1979)). Using the method of direct-counting, we outline the principles of parametric linkage analysis. Advanced topics such as non-parametric linkage analysis and multi-point analysis (Kruglyak et al. (1996)) are discussed in Appendix A. While the advanced topics that are included in Appendix A are necessary for a thorough grounding in linkage analysis, they are not required for an introduction to association analysis.
Nan M. Laird, Christoph Lange
Chapter 7. The Basics of Genetic Association Analysis
Abstract
A genetic association analysis is not fundamentally different from any other statistical association analysis. The objective is to establish an association between two variables: a disease trait and a genetic marker. The disease trait can be dichotomous, a measured variable, such as lung function or a quantitative measure of obesity, or time-to-onset of a disease or disorder. The genetic marker can be a known or suspected disease-causing mutation, or a marker without any known effects on DNA coding. In the latter case, the association is created by LD between the marker and the disease allele, as discussed in Chapter 5. Another distinctive feature of genetic association analysis is that two quite different study designs can be used; one which uses only unrelated subjects and the other which uses families that have at least two members with genetic marker data. Family designs have distinct advantages and disadvantages, and are an important class of studies. This chapter deals with study designs that use unrelated subjects; Chapter 9 considers designs for association analysis which use data on families.
Nan M. Laird, Christoph Lange
Chapter 8. Population Substructure in Association Studies
Abstract
Genetic association studies using population-based designs have distinct features that make them an attractive approach for gene mapping. Similar to epidemiological studies, they typically use unrelated individuals. As a consequence, the study recruitment is relatively easy and the statistical analysis is straight-forward to implement using standard statistical analysis techniques. This provides population-based designs with an advantage over other designs. Since epidemiological studies have a long tradition in biomedical research and are available for many complex diseases that are expected to have a genetic component, existing epidemiological studies can be converted into genetic association studies without much effort if the DNA of the study subjects is available, e.g., blood samples, etc. The study subjects have to be genotyped at the genetic marker loci, but often no additional phenotyping or, even, recruitment of subjects is required.
Nan M. Laird, Christoph Lange
Chapter 9. Association Analysis in Family Designs
Abstract
The use of family data has a long history in genetics, for association as well as linkage and segregation. Perhaps the simplest and most intuitively obvious example involving association analysis is a study comparing the genotypes in cases with the genotypes in their unaffected siblings. By using an unaffected sibling as a control, we eliminate issues of confounding by population substructure which are caused by comparing affected cases with unaffected controls whose genetic backgrounds are not comparable to the cases. With family controls, rejecting the null hypothesis of no association implies more than just ‘no association’. Finding a significant difference in genotype frequencies between cases and their unaffected siblings implies that the marker is both linked and associated with a disease locus underlying the trait (Appendix C).
Nan M. Laird, Christoph Lange
Chapter 10. Advanced Topics
Abstract
In this chapter we review specialized and advanced topics that are beyond the scope that can be covered in detail in an introductory text book. However, the topics are important research areas and the interested reader is encouraged to follow-up our brief introduction with the specialized literature.
Nan M. Laird, Christoph Lange
Chapter 11. Genome Wide Association Studies
Abstract
The key requirement for genetic association, linkage disequilibrium (LD), is a short distance property that extends only for a limited physical distance across the human genome. As we showed in Chapter 7, if there is low LD between the genotyped marker and the DSL, there will be low power to detect association between the disease and the DSL. In the early years of association testing, the strategy was mainly used to test specific regions, e.g., genes which were selected on the basis of function relative to the biology of the disease, or on the basis of linkage analysis. By restricting testing to a small enough region, markers can be selected for testing which should be in LD with the DSL anywhere in the region. In particular, SNPs in the coding region of a gene are often chosen as markers. With Genome Wide Association Studies (GWAS) the idea is instead to cover the entire genome with a sufficiently dense set of SNPs that all untyped polymorphsims (including DSLs) are in reasonably high LD with a tested SNP. For this reason, GWAS studies are sometimes called ‘unbiased’ because every region of the genome is searched, not just those meeting determined selection criteria.
Nan M. Laird, Christoph Lange
Chapter 12. Looking Toward the Future
Abstract
While genome wide association studies have led to the identification of robust associations for many complex disease phenotypes, they are typically not able to explain the amount of phenotypic variability that has been attributed to genetic factors by heritability studies such as those discussed in Chapter 4. For example for the phenotype height, heritability studies suggest that about 70% of the phenotypic variability is attributable to genetic factors. However, so far, GWAS for height have identified variants that explain a substantially smaller proportion of the genetic variance of this highly heritable trait (Weedon et al. (2008); Yang et al. (2010)). There are numerous reasons for this ‘missing heritability’, perhaps the most obvious being that the SNPs analyzed are likely only proxies for the real DSLs, and the fact that low frequency SNPs are difficult to detect.
Nan M. Laird, Christoph Lange
Backmatter
Metadaten
Titel
The Fundamentals of Modern Statistical Genetics
verfasst von
Nan M. Laird
Christoph Lange
Copyright-Jahr
2011
Verlag
Springer New York
Electronic ISBN
978-1-4419-7338-2
Print ISBN
978-1-4419-7337-5
DOI
https://doi.org/10.1007/978-1-4419-7338-2