Introduction

Studying adaptation and the genetic bases of the adaptive traits is an ambitious but daunting enterprise, especially for complex traits that have a polygenic basis and are strongly influenced by the environment. Indeed, uncovering the evidence of genetic adaptation is almost always hampered by the pervasive effects of evolutionary phenomena such as genetic drift, phenotypic plasticity, complex demographic history and complex genetic architecture. In the particular case of local adaptation, evolutionary biologists have developed efficient tools to overcome these challenges and the common garden experiment is one of them. The rationale behind this protocol is to control for the effects of phenotypic plasticity and, to a certain extent, genotype-by-environment interactions by growing individuals from different populations in a common environment, and by using the quantitative genetics toolbox (see Box 1) to study the genetic bases of complex traits (for example, life history, morphological and physiological traits).

Because it enables to unravel the genetic basis of complex phenotypes across various populations without the confounding effects of the corresponding environment, the common garden experiment is used to test for local adaptation signal in traits of interest such as life history traits (Kawakami et al., 2011), phenology (Brachi et al., 2013) and allometric relationships (Gonda et al., 2011). Local adaptation might be suspected because of the existence of an environmental gradient such as latitude (Toräng et al., 2015) or altitude (Alberto et al., 2011), or because of the existence of several contrasting environments, such as sea and fresh water (DeFaveri and Merilä, 2014). In addition, common garden experiments are also used to study the consequences of local adaptation for conservation (McKay et al., 2001) or even for ecosystem functioning (Bassar et al., 2010). Despite its name, and although it has been used extensively with plants (Linhart and Grant, 1996), this experimental approach can also be applied to a large variety of organisms including fish (Bassar et al., 2010; DeFaveri and Merilä, 2014), invertebrates (Spitze, 1993; Luttikhuizen et al., 2003) and small mammals (Bozinovic et al., 2009). The main limitations to this experimental design are the ability to breed the species and to grow the produced offspring in laboratory or seminatural conditions. Common garden experiments can also be used to study genotype-by-environment interactions, by implementing the same design in different environments. Although replicating common garden experiments is logistically challenging, the outcomes of such experiments are highly rewarding, as genotype-by-environment effects are likely common and very important in the wild (Stinchcombe, 2014). Note finally that, although common garden experiments are closely related to reciprocal transplant experiments (which aim at testing local adaptation by showing that the average fitness of local individuals is higher than the average fitness of aliens, see for example Å´gren and Schemske, 2012), there are important philosophical and practical differences between the two types of experiments. The difference is that reciprocal transplants are designed to prove local adaptation, whereas common gardens are designed to study the genetic bases of traits, regardless of whether they are adaptive or not. In practice, reciprocal transplants will typically create a differential survival, because the locals will survive better. This will be a confounding effect during the quantitative genetic analysis, because only the phenotypes of ‘fit’ individuals are available. Common gardens, by contrast, are often designed to be ‘softer’ on the individuals. Nevertheless, most of the elements in this article regarding common garden experiments can also be applied to reciprocal transplants, especially if one is interested in applying them to survival or some other measure of fitness.

To perform the quantitative genetics analyses of the studied traits, individuals of controlled families (that is, group of individuals with known genealogy) are used. An average relatedness between individuals is derived from this known genealogy and allows to infer within-population additive genetic variance VA, whereas effects due to the population of origin allows to infer the between-population additive genetic variance Vpop. This is so because all individuals share the same environment and, therefore, any average difference between populations must have a genetic origin. The residual variance VR accounts for all other kinds of effects (for example, environmental). These variance components can be used to estimate the heritability of the trait:

It is also possible to estimate QST, a standardised measure of genetic differentiation for quantitative traits (Spitze, 1993; Edelaar et al., 2011). QST is defined as the ratio of among-population (additive) genetic variance VA over the total genetic variance (that is, including the within-population additive variance Vpop), and in the case of diploid species is given by:

This parameter is a quantitative analogue of population genetics’ FST and, under a hypothesis of neutrality, both should be equal. Hence, a common approach for distinguishing between neutral drift and local adaptation scenarios is to compare QSTs and FSTs. Consequently, individuals from a common garden experiment are typically genotyped to compute FST.

Despite the advantages of common garden experiments, the study of local adaptation in non-model species during the past decade has been strongly driven by the study of genetic markers in natural populations (Luikart et al., 2003). Typically, evolutionary biologists go to natural populations, sample tissue from the individuals and genotype them with high-throughput methods and then proceed with a genome scan analysis of selection (see, for example, Eckert et al., 2010; Bourret et al., 2013; Fischer et al., 2013). Although this method can be quite powerful, it has some limitations (for example, false positives, no information on the adaptive phenotype). Several calls have been made to independently validate the results of such analyses (see Buehler et al., 2014 for a striking example), possibly using common garden or reciprocal transplant experiments (Holderegger et al., 2008; Pardo-Diaz et al., 2014; Rellstab et al., 2015). Following these lines, this perspective paper addresses three main questions: where does the common garden experiment stand in the genomic era? In particular, what can common garden experiments bring to population genomics? Conversely, how can techniques from the genomic fields (for example, high-throughput genotyping and model-based inference of neutral evolution) extend the range and scope of common gardens?

It is important to note that population genomics aims at linking genotypes and environments through genome scans methods but often completely neglects to study the phenotypic traits under potential selection. There is much to gain by adding phenotypes into the equation (Cushman, 2014). Yet, because phenotypic plasticity is hard to distinguish from local adaptation in wild populations, it seems useless, or at least dubious, to use phenotypes directly obtained in the field. This simple fact lies at the heart of common garden experiments and we suggest here that this approach is ideally suited to jointly study genotypes, phenotypes and environments, especially when they are combined with high-throughput genotyping and powerful statistical methods. After a short introduction to the different high-throughput genotyping methods available in the context of a common garden experiment, we will discuss how those methods and powerful statistical tools can rejuvenate this classical approach. Finally, we will discuss the complementarity between population genomics and common garden experiments, and how an integrative analysis can deepen our understanding of local adaptation.

High-throughput genotyping in the context of a common garden

High-throughput genotyping defines any genotyping method yielding a large number of markers, thus providing a dense marker panel across the genome. Given the focus on non-model species in this paper, we consider as few as 10 000 independent markers as fairly ‘dense’, provided that the genome of the species is not too large. For example, 10 000 single-nucleotide polymorphisms (SNPs) in a genome of size 100 Mbp would represent 3% of all SNPs if a SNP occurs every 300 bp.

The most straightforward high-throughput genotyping method is whole-genome sequencing. This method yields the largest possible number of markers, and offers the densest genotyping. However, this technique requires high DNA quality and quantity, bioinformatics computation power and, most importantly, access to genomic resources (for example, genome assembly) within a relatively short phylogenetic range. The huge number of markers generated can also be problematic during the analyses because of high computation/memory requirements, high redundancy in information between linked markers and low signal-to-noise ratio. Still, whole- genome sequencing is the ultimate high-throughput genotyping method, yielding up to millions of SNP markers throughout the whole genome. With a decreasing cost and an increase in the number of species for which the whole genome has been sequenced over the years, it might soon become a recommended technique even for non-model species. A cheaper alternative to whole- genome sequencing are SNP genotyping chips, with most of the limitations above applying still.

For now, an approach likely to be best suited for non-model species is genome representation sequencing. The overall principle of this approach is to sequence only restricted, but random, parts of the genome in order to decrease the sequencing effort, and hence the overall costs and computational efforts associated with genotyping. To do so, the above approaches mainly use DNA digestion by restriction enzymes followed by a ligation of tags and primers and PCR amplification. This is akin to the principle underlying amplified fragment length polymorphism (AFLP) genotyping (Vos et al., 1995). Here, however, the DNA fragments (or at least some of them) are partially sequenced (100 bp) using next-generation technology such as Illumina HiSeq (Illumina Inc., San Diego, CA, USA). This kind of approach includes the genotyping-by-sequencing method (Elshire et al., 2011) and the family of restriction site-associated DNA sequencing methods (Miller et al., 2007; Baird et al., 2008).

The sequences obtained are then analysed using quality checks (that is, selecting reads according to their sequencing quality, local coverage, availability over all or most individuals and so on) and SNP calling pipelines in order to identify SNP markers. Note that contrary to the AFLP approach, markers issued from restriction site-associated DNA sequencing are preferentially issued from nonpolymorphic restriction sites and are codominant. Alternatively, when more than one SNP is present on a 100-bp sequence, they can be combined into a new marker with more than two alleles. The rationale behind this is that very close SNPs are likely to be strongly associated because of physical linkage, in which case fewer but independent markers composed of more alleles are often preferable to strongly linked SNPs. Genome representation protocols can yield up to several hundreds of thousands of SNPs, but more typically tens of thousands. This can be achieved at a cost comparable or up to 10 times the cost of an AFLP analysis.

For all of the above, it is clear that next-generation sequencing makes possible the generation of a very large number of markers for a moderate cost. When compared with AFLP markers, next-generation sequencing marker panels are denser, and the markers are codominant and less arbitrary in their interpretation (that is, no ‘binning’ process), hence better in every way, except possibly for their cost. Microsatellites, on the other hand, are very different: they usually provide very sparse panels (up to a few dozens of markers), but highly mutable and with a large allelic diversity. Although it has been argued that microsatellites are better markers to infer relatedness (Ritland, 2000), they typically yield smaller relatedness estimates than SNP or AFLP markers because of higher mutation rates (Uptmoor et al., 2003; El Rabey et al., 2013). They also yield smaller FST estimates (Edelaar and Björklund, 2011) for the same reason. Finally, although in theory more accurate than SNPs for the same number of loci, they typically yield one to two orders of magnitude less loci, and hence they are less accurate in practice (Uptmoor et al., 2003).

A key issue is the number of individuals that need to be genotyped. Our view is that ideally all individuals from the experimental garden(s) should be genotyped, because this opens the way toward the more refined or novel analyses detailed below. However, some of the analyses suggested here (for example, genome scans) can be performed even when a subsample of individuals have been genotyped. De Kort et al. (2014), for example, have sampled one individual per family in their common garden experiment to combine it with population genomics (that is, genome scans) analyses. This cheaper subsampling procedure might be very attractive to researchers who are not interested in individual genotypes: that is, neither in the relatedness inference nor in the genome-wide association studies that are described below.

Common gardens 2.0: new markers and new methods

We are certainly not the first to encourage the evolutionary biology community to switch toward next-generation sequencing technology (Luikart et al., 2003; Savolainen et al., 2013), and it is clear that such a ‘revolution’ is already happening (reviewed in Pardo-Diaz et al., 2014). However, we wish here to emphasise the interest of dense marker panels in the context of a common garden experiment.

As stated above, a study of the genetics of complex traits such as that measured in common garden experiments strongly relies on the relatedness between individuals that is often assumed, especially when individuals are siblings (see, for example, Hernández-Serrano et al., 2014). Yet, contrary to the parent–offspring relationship, the relatedness between siblings varies: the commonly used value of 0.25 between half-sibs, for example, is only an average, expected value. Hence, using realised relatedness, inferred from molecular data, can allow for better estimates in the sense that (1) they are more robust to error in the kinship assessment (for example, full-sibs instead of half-sibs) and (2) they reflect more accurately the variation in relatedness between siblings. Better relatedness estimates are useful because they will improve the precision of the estimates of h2 and QST. Note however that many markers are typically needed to obtain precise molecular estimates of relatedness (Uptmoor et al., 2003). Dense markers provided by high-throughput genotyping naturally fulfill this requirement.

A large number of markers also allows the reconstruction of the family structure. Indeed, even when relatedness is precisely estimated, the family structure (that is, who is the mother/father of the individuals, which individuals are full- or half-sibs) is of utmost importance in order to account for many confounding effects such as dominance (Wolak and Keller, 2014), parental effects (for example, maternal, Wilson et al., 2010) or selfing (Gauzere et al., 2013). Note that maternal effects can also be accounted for by weighting seeds (in plants, Roach and Wulff, 1987) or reduced by using F2 generations (Roach and Wulff, 1987; Mousseau and Dingle, 1991). However, the possibility of using one of these methods will strongly depend on the studied species. According to Jones et al. (2010), brood size is one of the biggest limitations for parental reconstruction algorithms because of issues of unsampled alleles when too few segregating individuals are available. With many markers, even with low levels of polymorphism (such as SNPs), this is no longer an issue, as it becomes possible to reconstruct a large-enough proportion of the parental genomes to obtain high certainties of assignment, even for small brood sizes. Now that efficient algorithms such as those implemented in COLONY (Jones and Wang, 2010; Wang, 2012), are available, the number of markers should not be a problem. This software allows reconstructing the family structure, as well as inferring parental genotypes, while accounting for selfing or genotyping errors. Indeed, one crucial issue for parental inference with a large number of markers is to include possible genotyping errors that, if left unaccounted for, can severely bias the results (Wang, 2004).

The most innovative statistical method, especially designed to study common garden data, is probably the one developed by Ovaskainen et al. (2011) that overcomes several problems associated with the classical FSTQST comparisons. In order to avoid clumsy comparisons between two noisy estimators, Ovaskainen et al. (2011) conceived a model of neutral phenotypic differentiation between populations that is compared with phenotypic differentiation measured in a common garden experiment (that is, the genetic differentiation linked to the phenotype). When suspiciously strong phenotypic differentiation is observed compared with the neutral expectation, a local adaptation hypothesis can be proposed. The neutral model of phenotypic differentiation is actually a combination of a within-population ‘animal model’ (see Kruuk, 2004 for a description of the model) and an among-populations ‘F-model’ (see Gaggiotti and Foll, 2010 for a description of the model) of phenotypic evolution (Karhunen and Ovaskainen, 2012). By doing so, this model allows for a multivariate genetic analysis to be performed, for example, to infer genetic correlations and a G matrix. This is a perfect illustration of how models emerging from the field of population genomics (here the F model) can be used to dramatically improve the analysis of common garden data sets. This method has been implemented in the DRIFTSEL package (Karhunen et al., 2013). Using this method, Karhunen et al. (2014) demonstrated the presence of strong footprints of local adaptation in several populations of nine-spine stickleback (Pungitius pungitius).

What is the use of common garden experiment in the genomic era?

It is well known in the domain of genome-wide association studies, which aim at uncovering the loci responsible for phenotypic variation, that such analyses should be performed with extreme caution because of the potential effect of hidden population structure. Especially important are the combined effects of genetic drift and gene flow, and the confounding effect of phenotypic plasticity. However, both of the aforementioned problems can be overcome. Structure between population structure can be accounted for by using appropriate models (see, for example, Nicholson et al., 2002; Beaumont and Balding, 2004) or methods (Frichot et al., 2013) from the genome scan literature. The second problem, on the other hand, is perfectly addressed by common garden experiments that were specifically designed to control for phenotypic plasticity.

As a result, combining common garden experiments of non-model species with genome-wide association studies provides opportunity for multiple-population genome-wide association studies (Brachi et al., 2013; Slavov et al., 2014). For a locally adapted trait, it would even be possible to differentiate markers explaining among-population phenotypic variability (by testing for among-population effects) from markers explaining within-population variability (by testing for within-population effects). The technique of within-group centring (Davis et al., 1961; van de Pol and Wright, 2009) could be used to this end. It simply consists in distinguishing between the mean-population effect and the within-population effect of each predictor of an association model, as follows:

where yij is the phenotype of individual i in population j, xij is its genotype and j the mean genotype in population j. The parameters μ, βB and βW are the fixed effects of the model. Note that the within-population effects can be tested independently by using a parameter βWi for each population j. The term uj stands for any population structure correction and eij is the residual. This equation is simply an illustration of within-group centring and does not constitute a model per se. Accounting for population structure should help in distinguishing between neutral and selective scenarios for markers associated with between-population variability. As always (Korte and Farlow, 2013), the power of a genome-wide association study to actually detect loci linked to the phenotypic variability strongly depends on the extent of linkage disequilibrium and the density of markers along the genome, in addition to the sample size. Hence, the most useful, but most expensive, genotyping method for this kind of analysis is whole-genome sequencing. Note also that heterogeneity in recombination/mutation rates along the genome can generate false positives during such analyses (Korte and Farlow, 2013). Here, the number of populations is also of importance, as it will determine the power to detect significance for the parameter βB. Note that Brachi et al. (2013) used a different approach of multiscale (local to worldwide variation) analysis and found very different results depending on the studied scale of local adaptation. The approach that is probably the most typical of the genomic era is to scan genomes for signal of selection (mostly selective sweeps and local adaptation). Many methods have been developed in the past decades to detect local adaptation (Beaumont and Balding, 2004; Foll and Gaggiotti, 2008; Bonhomme et al., 2010; Coop et al., 2010; Frichot et al., 2013; Duforet-Frebourg et al., 2014; Guillot et al., 2014). Despite considerable efforts to account for population structures, these methods have been shown to display high error rates (de Villemereuil et al., 2014; Lotterhos and Whitlock, 2014). Hence, validation of the results of a genome scan must always be done using independent tests. Gene ontologies and pathway analyses are the most common mean of checking these results. However, it has been suggested that common garden experiments might be a very efficient complement to those analyses (De Kort et al., 2014; Lepais and Bacles, 2014; Rellstab et al., 2015).

Performing genome-scan analyses using common garden data can have many advantages. If a strong adaptive signal is detected both using both using genome scan methods (that is, using genotypic and possibly environmental data) and the phenotypic data from a common garden experiment, that will constitute two independent piece of evidence favouring the hypothesis of local adaptation (Holderegger et al., 2008). As stated above, genome scan results need to be validated anyhow (Pardo-Diaz et al., 2014; Rellstab et al., 2015), and performing a common garden experiment is an elegant way to do so. We suggested that, whenever possible, combining genome scan approaches with common garden experiments is an efficient approach to the study of local adaptation. Moreover, by comparing the loci showing strong signals of differentiation and the loci associated with among-population phenotypic differentiation, it is possible to isolate candidate loci for local adaptation with very little information regarding the functional annotation of the species’ genome. Third, using the environmental information allows not only to identify the selected phenotypes (that is, strongly differentiated genetically), but also to infer the environmental variable driving the selective pressure. In particular, if a locus is strongly associated with an environmental variable and with the among-population phenotypic differentiation, one might conclude that a relationship exists between the environmental variable and the phenotype (although only correlatively: each variable is a putative proxy for the real selective/selected variable).

An important problem when performing genome-scan analyses directly on common garden individuals is to correctly infer the source-population allele frequencies. The preferred way is simply to genotype the parents of the common garden individuals. However, this is not always possible (for example, genotyping the father for plants is impossible most of the time). In that case, allele frequencies inferred directly from the individuals should be accurate, as long as there is no sex-dependent allelic frequency bias. But the confidence in that inference will be overestimated by the fact that many related individuals were sampled. To account for this situation, a conservative solution is to calculate the allele frequencies based on the individuals of the common garden, but to consider that the sample size of these estimates are the number of parents that have generated the offspring. With these kind of data, all population-based methods (such as Bayescan, Foll and Gaggiotti, 2008, or BayEnv, Coop et al., 2010) can be used. A second solution, if the confidence in parental genotypic reconstruction is high enough, is to directly use the inferred genotypes of the parents, both to infer allele frequencies in the population and directly as data for individual-based genome scan methods. Yet, in practice, these data will always be inferred with some uncertainty, and the consequences of ignoring this uncertainty during post hoc analyses is unknown. Still, the interest of this approach is that individual-based methods (such as Latent Factor Mixed Model, Frichot et al., 2013, or PCAdapt, Duforet-Frebourg et al., 2014) can be used to analyse the data. A last solution is the one implemented by De Kort et al. (2014) that consists in using only one individual per family. Although this solution requires a sufficiently large number of families for each population, it has the compelling advantage of simplicity and efficiency.

Conclusion

Local adaptation is a play starring three actors: the environment, the phenotype and the genotype. The environment selects the phenotypes that are (partly) determined by a number of genes. The evolutionary result is a change in allele frequencies of the polymorphic coding genes. Understanding the relationships between the three actors requires precise but large-scale measurements, rigorous experiments and powerful statistical methods. Because phenotypic plasticity is such a pervasive phenomenon and because it is nearly impossible to account for its effect on in situ phenotypes, phenotypes should never be directly compared between different populations, unless a case is made that the comparison is safe enough (low environmental contrasts or little phenotypic plasticity). In contrast, common garden experiments are ideally suited to perform such kinds of analyses, and hence to study the phenotypic traits affected by local adaptation. Now that dense marker panels are obtainable for many individuals at a moderate cost, common garden experiments are expected to be performed more routinely. Of course, this is unless the biological characteristics (for example size, behaviour, generation time) prevent the applicability of this experiment. Common gardens could possibly even replace the field work required to obtain tissue samples for genotyping: as we mentioned, it would still allow for population genomics approaches, while guaranteeing independent validation through the study of phenotypes (Pardo-Diaz et al., 2014; Rellstab et al., 2015), hence saving the cost of another genotyping campaign. As emphasised by Lepais and Bacles (2014), deciphering the genetic basis of local adaptation will only be accomplished by combining all the information yielded by dense marker panels, careful experiments and in situ sampling and observations. Replicating common garden experiments in different environments can also provide insight into complicated relationships between the three actors such as genotype-by-environment interactions. High-throughput genotyping provides an abundance of genetic data. World-wide fine-scale databases (for example, WorldClim, Hijmans et al., 2005) and the advent of cheap in situ sensors also provide high-quality environmental data. However, collecting phenotypic data is still time consuming, tedious and sometimes expensive. It thus seems that the last challenge that needs to be overcome is the development of high-throughput phenotyping allowing for a scaling-up and a more widespread use of common garden experiments.