Introduction

Technological innovations in spatial analyses coupled with increased availability of spatial data and hypervariable genetic markers have resulted in great advances in our ability to study the influence of landscape variables, such as altitude, topography and ground cover, on genetic variation and structure. As a result, landscape genetics (Manel et al., 2003) has emerged as a new research area that integrates landscape ecology, spatial statistics and population genetics. In contrast to traditional population genetics studies that were limited in spatial inference to tests of isolation-by-distance, landscape genetics provides a framework for testing the relative influence of landscape and environmental features on gene flow, genetic discontinuities (Guillot et al., 2005a) and genetic population structure (Manel et al., 2003; Holderegger and Wagner, 2006).

Understanding landscape effects on genetic connectivity provides insight into fundamental biological processes such as: metapopulation dynamics, speciation, and ultimately the formation of species' distributions. Landscape genetic analyses can also have great applied scientific value, such as identifying specific anthropogenic barriers that reduce gene flow or genetic diversity, predicting the effects of proposed management alternatives on genetic variation and population connectivity, and identifying potential biological corridors to assist with reserve design.

Given these diverse research opportunities, landscape genetics is both challenging and exciting, as it brings together scientists from the broad disciplines of landscape ecology, spatial statistics, geography and population genetics. While several types of spatial statistical analyses have been used in geographical genetics (for review, see Epperson, 2003), there are many well-developed methods in landscape ecology and spatial statistics that have yet to be utilized. The vast array of spatial analysis techniques that can be applied to population genetic data make options for designing and conducting a landscape genetics study extremely diverse and potentially confusing. Better communication among landscape ecologists, spatial statisticians, remote-sensing scientists, geographers and population geneticists is key to integrating analysis methods and empirical data. To help bridge communication gaps, we have included a glossary of terminology used in spatial statistics and landscape ecology (Table 1), with the terms denoted in italics when first used in the text.

Table 1 Glossary of spatial statistics/landscape ecology/geographical genetics terms used in the text

Our goals are to: (a) offer a definition of the term ‘landscape genetics’; (b) review questions commonly addressed in the landscape genetics literature, (c) provide guidelines for sampling design, (d) highlight potentially useful analysis techniques; and (e) discuss future directions for the field. The most commonly used molecular tools for landscape genetic studies are neutral, hypervariable markers (e.g. amplified fragment length polymorphisms and microsatellites), and we generally assume their use throughout this review.

Defining landscape genetics

In general, landscape genetics seeks to understand the influence of ecological processes (Turner et al., 2001) on genetic variation by quantifying the relationship between landscape variables, population genetic structure and genetic variation (the latter two hereafter collectively referred to as ‘genetic variation’). Since Manel et al. (2003) coined the term ‘landscape genetics,’ a diversity of published articles have been labeled landscape genetics because they incorporated geographic coordinates or landscape features when evaluating the spatial distribution of genetic variation (Table 2). These studies have varied extensively in their approach to evaluating relationships between landscape variables and genetic variation. We suggest that landscape genetics studies could benefit substantially by including explicit tests of the relative influence of landscape variables on genetic variation by incorporating robust, spatially informed study designs and spatial analyses. Thus, in this review, we define landscape genetics as research that explicitly quantifies the effects of landscape composition, configuration and matrix quality on gene flow and spatial genetic variation. This definition expands on the description of landscape genetics in Holderegger and Wagner (2006). Phylogeography (Avise, 2000) can also be used to quantify genetic variation in relation to ecological processes, but at a larger spatio-temporal scale than landscape genetics, making it more comparable to biogeography (Manel et al., 2003).

Table 2 Recent works in landscape genetics, highlighting sampling design, molecular markers used, analytical techniques, and basic conclusions

Major research categories in landscape genetics

There are a wide variety of basic and applied research questions that can be addressed using a landscape genetics approach (see Tables 2 and 3). We group these questions under five major research categories: (1) quantifying influence of landscape variables and configuration on genetic variation; (2) identifying barriers to gene flow; (3) identifying source-sink dynamics and movement corridors; (4) understanding the spatial and temporal scale of an ecological process; and (5) testing species-specific ecological hypotheses.

Table 3 Models that can explicitly incorporate landscape data organized by research questions presented in text

Influence of landscape variables and configuration on genetic variation

Quantifying the effect of landscape configuration on gene flow has been a major focus of published landscape genetics studies (Manel et al., 2003; Scribner et al., 2005; see Table 2). Statistical analyses of genetic data have been used to identify the effects of matrix resistance on gene flow and genetic structure, including: cover type (Keyghobadi et al., 1999; Spear et al., 2005), stream distance (Roach et al., 2001; Antolin et al., 2006), historic landscape configuration (Holzhauer et al., 2006), water flow rates (Michels et al., 2001), ridge distances (Pfenninger, 2002; Funk et al., 2005), thermal cover (Scribner et al., 2005) and the effect of landscape configuration on allelic fixation time (Ezard and Travis, 2006). Recently, Sezen et al. (2005) revealed the impact of landscape change and patch type on genetic variation in Costa Rican canopy palms (Iriartea deltoidae) by documenting a decrease in genetic diversity and an increase in the patch diameter of similar genotypes in second growth forests compared to old growth patches. This demonstrates that intrinsic scale, the area encompassed by a population as estimated by a genetic neighborhood, may change across a landscape due to the landscape composition and configuration. As another example, wolverine populations in intact habitats have a larger intrinsic scale than populations in fragmented habitats (Cegelski et al., 2003).

Identifying barriers

Identifying potential gene flow barriers is a major focus of landscape genetics research. While all landscape features affect gene flow, particular structures such as roads (Riley et al., 2006), waterways (Antolin et al., 2006) or mountain ridges (Funk et al., 2005) are potentially impenetrable barriers. Genetic data have been used to identify abrupt breaks in gene flow (Dupanloup et al., 2002; Manni et al., 2004) as well as more gradual transitions (Geffen et al., 2004). Barriers may also consist of microhabitats that prevent gene flow because they exceed a threshold for moisture, temperature or chemical tolerance for particular species (Palo et al., 2004). Therefore, barrier identification has important implications for ecological (Walker et al., 2003; Kreyer et al., 2004; Funk et al., 2005), conservation (Bhattacharya et al., 2003; Miller and Waits, 2003; Dodd et al., 2004) and evolutionary (Castella et al., 2000; Broderick et al., 2003; Cicero, 2004; Gee, 2004) investigations.

One distinct benefit of a landscape genetics approach is that spatially explicit techniques can allow researchers to identify barriers not detectable by traditional population genetic methods (Guillot et al., 2005a; Coulon et al., 2006). For example, Coulon et al. (2006) found genetic structuring in roe deer due to highways and rivers using spatial assignment test methods (in the program GENELAND, Guillot et al., 2005b), whereas a non-spatial assignment test (STRUCTURE; Pritchard et al., 2000) was not able to identify any genetic discontinuities. Landscape genetics can also be used to quantify the cumulative impact of a particular barrier type distributed across the landscape. For example, Epps et al. (2005) evaluated genetic diversity and structure in 27 desert bighorn sheep populations (Ovis canadensis nelsoni) and showed that diversity was negatively correlated with fenced highways, canals and human development. They also estimated a ‘barrier effect distance’ and suggested that any one of these barriers would create the same decrease in gene flow as 40 km of contiguous habitat.

Source-sink dynamics

Understanding source-sink dynamics (Pulliam, 1988; Dias et al., 1996) and variation in habitat quality can be useful for identifying corridors and guiding reserve design. Genetic data have been used to identify source and sink habitats for populations by identifying asymmetric gene flow using private alleles (Kennington et al., 2003), and estimating the number of migrants into a population using either a coalescent approach (Beerli and Felsenstein, 2001) or assignment tests (Paetkau et al., 1995; Wilson and Rannala, 2003). Theoretical population models suggest evaluations of linkage disequilibrium can be used to detect sink habitats because disequilibrium is predicted to be higher in individuals from sinks due to immigrants from different sources (Nei and Le, 1973). Dias et al. (1996) empirically tested this theory and found higher linkage disequilibrium in blue tits (Parus caeruleus) living in a known sink habitat (evergreen forest) than that of birds sampled from a known source habitat (deciduous forest).

Physical locations of corridors have also been identified using landscape genetic approaches For example, Banks et al. (2005) used spatial autocorrelation analysis of genetic structure to demonstrate that riparian strips of native eucalyptus forest facilitate dispersal in a marsupial carnivore (Antechinus agilis) compared to the matrix habitat of pine plantations. This study also helped detect a dispersal threshold for male A. agilis by showing that fragmented pine plantation habitat significantly reduced dispersal at distances greater than 750 m. Least-cost analysis also has been valuable in identifying landscape variables that facilitate gene flow and may function as corridors (Spear et al., 2005; Vignieri, 2005). For example, Vignieri (2005) found that least-cost paths that minimized elevation gain and maximized riparian forest cover were strongly correlated with gene flow in the Pacific jumping mouse (Zapus trinotatus).

Spatial and temporal scales

Genetic variation may respond differently over varying spatial or temporal scales, which is a critical issue in defining research questions and subsequent study design in landscape ecology and spatial statistics (for review, see Gardner, 2001). The scale at which particular landscape variables have the greatest influence on gene flow (i.e. process scale) may give insight into species' biology. For example, Trapnell and Hamrick (2004) showed that the contributions of pollen and seed movement to overall gene flow in the Central American epiphytic orchid, Laelia rubescens, were scale-dependent. Primary factors governing gene flow were seed gravity (seed dispersal) at the finest spatial scale, hummingbird behavior (pollen dispersal) at the intermediate scale, and wind (occasional seed dispersal) at the broadest scale.

Temporal scale may also have a significant impact on landscape genetics. For example, Ramstad et al. (2004) detected significant genetic structure among sockeye salmon (Oncorhynchus nerka) populations in Alaska and used simple and partial Mantel tests to evaluate the relative influence of different ecological and evolutionary factors on genetic differentiation. They found that temporal isolation based on spawning time and founder effects associated with ongoing glacial retreat and colonization of new spawning habitats contributed significantly to genetic population structure, while geographic distance and spawning habitat differences did not have significant influence.

Species-specific hypothesis testing

Landscape genetics offers new approaches for testing hypotheses specifically related to how the ecology of the study species shapes patterns of genetic variation, such as identification of bioregions (Sacks et al., 2004), potential response to climate change (Rehfeldt et al., 1999), and ecological variables (Jørgensen et al., 2005). For example, Sacks et al. (2004) tested the hypothesis that coyotes would exhibit natal-biased dispersal by evaluating population genetic structure across four contiguous habitat bioregions in Northern California and found genetic groupings could best be explained by habitat bioregions and not habitat barriers. Rehfeldt et al. (1999) concluded that the ability of western US conifer species to adapt to climate change may be constrained due to significant geographic structure in quantitative genetic traits associated with elevation and latitude. In a study of herring (Clupea harengus) in the Baltic Sea, Jørgensen et al. (2005) showed that the ecological variables that best explained genetic variation were salinity, surface temperature and spawning time.

Study design

Importance of study design

Historically, population genetic studies often relied on opportunistically collected samples from known localities or in easily accessible areas. Opportunistic sampling may fail to capture the spatial variation or spatial dependency of the system, resulting in difficulty detecting spatial relationships or erroneous model inferences (Legendre, 2002; Fortin and Dale, 2005). Thus, studies should be designed to sample the variable(s) of interest within the scale of spatial dependency (Coulon et al., 2004). In addition, population genetics studies have been traditionally designed to collect samples from a minimum of 20–30 individuals per a priori delineated ‘population’ (Nei, 1978). However, a priori delineation is no longer necessary due to the development of genetic clustering algorithms, such as assignment tests (Pritchard et al., 2000; Wilson and Rannala, 2003; Manel et al., 2005). In addition, models of landscape influence on genetic variation often require more continuously distributed sampling, which can be analyzed with a wide variety of spatial analysis techniques reviewed herein (see Table 3).

Careful consideration of scale in study design is also critical because arbitrarily defined scales may lead to erroneous conclusions or fail to capture appropriate variability in the landscape (King, 1990; Gardner, 2001). We refer to scale as the appropriate spatial or temporal dimensions at which processes can be observed and quantified (for review, see Dungan et al., 2002). Under this definition, scale has two relevant components: ‘grain’ and ‘extent’ (O'Neill et al., 1988). Grain is the minimum resolution of the data and extent is the total area of interest. Accounting for grain and extent in both the dependent and independent variables ensures that the appropriate scale for a specific question(s) is captured (for review, see Marceau, 1999). When designing a landscape genetics study, researchers can determine appropriate sampling scale with pilot data (e.g. movement data of the study organism), or data from similar organisms in other areas. Ideally, a spatial sampling scheme should capture the range of spatial variation in both landscape variables and organismal genetic variability by collecting observations across a range of variances (e.g. from low to high local spatial autocorrelation and from low to high genetic relatedness, respectively) (Cressie, 1993; Goovaerts, 1997; Fortin and Dale, 2005).

If pilot data are available, exploration of intrinsic scale and stationarity can also be executed before sampling or modeling. Most spatial statistical models assume stationarity in the response variable that is implicitly a function of a spatial generating process (Fortin and Dale, 2005). In other words, the underlying assumption of most spatial statistical models is that the same ecological processes (e.g. slope, moisture and topography) influence the response variable (e.g. genetic variation) in the same way across the whole study area. However, this assumption is likely to be violated in complex landscapes. Violations of stationarity can be tested with local indicators of spatial association (LISA) (Anselin, 1995). Non-stationarity can then be accounted for by incorporating locally adjusted methods such as geographically weighted regression or locally coherent spatial regression (Haining, 2003; Fotheringham et al., 2004).

In landscape genetic studies, it is important to consider how the effects of temporal scale influence genetic variation. The theoretical genetics literature has addressed how temporal heterogeneities in demographic parameters such as dispersal and density can have large effects on spatial genetic variation (Whitlock, 1992; Leblois et al., 2004). However, this complexity is often overlooked in landscape genetic studies. For species with overlapping generations and cyclic changes in density, allele frequency shifts among cohorts and sampling years are expected (Jorde and Ryman, 1995). If not accounted for, temporal genetic heterogeneity can be incorrectly interpreted as true population differentiation, particularly in cases when genetic differentiation is weak (Waples, 1998).

Incorporating landscape data

Landscape genetic studies require data from two distinct sources: landscape data (e.g. remotely sensed data, digital elevation models, field collections) and multilocus genetic data. Landscape data are gathered in a number of ways, including: field surveys, aerial remote sensing and/or satellite remote sensing. As these methods contain several potential sources of error that may obscure relationships with genetic variation (e.g. temporal sampling frequency, data generalization; Burrough 1986), use of data sets with error documentation and associated metadata is highly desirable (please see the US Federal Geographic Data Committee website, http://www.fgdc.gov/standards/standards_publications, for lists of metadata standards).

Recent advances in fine scale data resolution (<4 m) and analysis methods have greatly improved the spatial accuracy and precision of detecting, classifying and delineating landscape habitat characteristics in both two (Wulder et al., 2004; Greenberg et al., 2005) and three dimensions (Lefsky et al., 2002). Such fine scale data can help create detailed digital elevation models and spatially explicit vegetation canopy structure products. At coarser spatial scales (resolution 250 m–1.1 km), image data are acquired for the entire globe twice per day, enabling analyses at unprecedented temporal resolution (Rahman et al., 2004; Running et al., 2004) to compare vegetation phenology with genetic variation in plants and animals. However, it is important to note that the scale at which data are collected should match the scale of the study questions and hypotheses.

Sampling

When selecting a sampling design, there are several important considerations. First, what is the research or management question? For instance, identifying specific barriers to gene flow will require a different sampling scheme than identifying source and sink habitats. Second, is the species continuously distributed across the landscape or does it have a spatially random or aggregated distribution? To make inferences about landscape influence on genetic variation, the sampling scheme should capture the variability in the independent landscape variables of interest (Bueso and Angulo, 1999). Third, based on the sampling design and data collected, which types of statistical analysis models are appropriate? (see Table 3; Forman, 1997).

Simple sampling designs discussed in this paper include systematic (uniform, unaligned, random start) and random (simple random or clustered) (Figure 1). Systematic sampling designs (Figure 1a–c) are more efficient in covering the extent of the study area than random samples (Harrison and Dunn, 1993). However, a uniform systematic design (Figure 1a) can be biased when it coincides in frequency with a regular pattern in the landscape due to a fixed sampling interval, and may miss anisotrophic effects (Berry and Baker, 1968; Fortin and Dale, 2005). Random sampling designs have the advantage of producing a spatially unbiased sample and can be applied to continuously distributed species or a subset of the landscape based on required habitat conditions (Figure 1d–e). A random sample can be beneficial because it generates a wide range of distances among points, helping describe the relationships between observations across space if sampling is dense enough such that sample observations cluster appropriately to optimize semivariogram estimation. However, the distribution of samples may not be representative of the underlying geographic surface, because for most samples drawn, some areas will be oversampled while others will remain undersampled (Griffith and Amrhein, 1997).

Figure 1
figure 1

Visual representation of discussed sampling approaches. Diagrams with black backgrounds represent sampling approaches appropriate for continuously distributed populations, gray backgrounds represent sampling approaches appropriate for continuously distributed or clustered populations, and the white background represents a sampling approach most appropriate for clustered populations. Systematic sampling (ac) designs assume a relatively even distribution of potential observations across the landscape and therefore can be used with continuously distributed populations. Black dots represent sampling locations for (a) uniform sampling, (b) unaligned sampling and (c) random start systematic sampling. Random sampling designs (de) ensure independence between samples because each location (or cluster) in a study area has the same probability of being chosen (Berry and Baker, 1968). Black dots represent sampling locations for (d) simple random sampling and (e) clustered random sampling. More complex sampling designs can be applied with either simple or random designs. Hierarchical sampling (f) is generally appropriate for collection of data at two scales: black dots indicate sample locations for the broad scale variable, a second variable is sampled at finer scale at the black and gray dots. Nested sampling (g) requires the landscape to be partitioned into systematic or irregular sample units (i.e. ‘blocks’), which are then further subdivided into sample units nested within this first level, and so forth (Haining, 1990; Thompson and Seber, 1996; Haining, 2003). Block 1 represents the largest area, which is then partitioned into two Block 2 units. The Block 2 units are then further subdivided into the smallest units (Block 3). In stratified sampling (h) the landscape is partitioned based on some variable(s) that are unequally represented in the landscape and observation(s) are taken from each stratum (Hudak et al., 2004). Lines represent strata designed to partition variation across the landscape, black dots represent random sample location within each strata. Shown is equal sampling by strata, but sampling can also be unequal.

These general sampling schemes can be implemented within more complex designs such as spatial hierarchical sampling (Figure 1f), nested sampling (Figure 1g) and stratified sampling (Figure 1h). In a hierarchical sampling design, one measurement (e.g. cover type) is collected more intensively and at a finer spatial scale than a second variable (e.g. genotypes) (Haining, 1990). Landscape variables may be less expensive to measure and change more rapidly in space than genetic variation, making spatial hierarchical sampling a viable option in landscape genetic studies. Spatially nested sampling designs generally work well for populations that are naturally clustered or for questions of spatial scale. For example, at a broad scale (block 1), limits to gene flow across the landscape may be explained by ridgelines and canyons (Figure 1g). However, at a finer scale (block 3), limits to gene flow may be driven by microhabitat conditions.

A stratified sampling design can be implemented to capture the range of variability across landscape variable(s) of interest and to account for unbalanced representation of landscape variables (e.g. 80% of land cover is of one class) (Figure 1h). Any of the previously mentioned sampling designs can be applied within each stratum. Stratified random sampling is particularly appropriate for gradient analysis and the impact of landscape configuration when variables (e.g. rainfall) are unequally represented across the study area, when the sampling area is large compared to the number of feasible observations, or when the process (e.g. gene flow) is non-stationary. Stratification guarantees a spread of observations across defined conditions in the study area, while randomization leads to a wide range of distances among pairs of points.

Analysis methods

Analysis techniques in landscape genetic studies have utilized several statistical approaches including: assignment tests, tests of matrix correlations, dispersal route analysis and autocorrelation. Other well-developed methods in landscape ecology, spatial statistics and geographical genetics that have received less attention are: ordination, landscape metrics and spatial interpolation. These methods can be applied using a wide variety of statistical software (see Appendix A) and are discussed in detail below.

Assignment tests

While traditional population genetic statistics (e.g. FST, RST, θ, Nei's D) often require a priori population delineation, assignment test approaches (Pritchard et al., 2000; Manel et al., 2005) can be used to gain insight into landscape genetic patterns without a priori delineation of populations. For example, Proctor et al. (2005) sampled brown bears (Ursus arctos) on both sides of highways in southern British Columbia and Alberta, Canada, and an ecologically similar ‘control’ valley not impacted by a transportation corridor. Using assignment tests, they detected lower levels of gene flow across the highway than in the control valley.

Another approach is to identify spatial genetic discontinuities. Manni et al. (2004) used Voronoi polygons to build geometric relationships between populations and then applied a modified version of Monmonier's algorithm (1973) to identify genetic discontinuities to infer a barrier to gene flow. Monmonier's algorithm creates a break line through the vectors between observations (i.e. the lines of the Voronoi polygons) based on the greatest slope, which reflects higher genetic distances.

Spatial assignment tests try to identify genetic discontinuities in populations that are spatially contiguous (Guillot et al., 2005a, 2005b; Corander et al., 2006; Francois et al., 2006; Manel et al., submitted). GENELAND is a spatial-assignment test that combines Voronoi tesselation with a Bayesian assignment approach to identify genetic spatial discontinuities (Guillot et al., 2005b). Populations are specified conditionally on the spatially weighted variance structures identified within each Voronoi polygon (Guillot et al., 2005b). Manel et al. (submitted) used a moving window approach to apply a local assignment test (Rannala and Mountain, 1997) and generate a spatially referenced probability map for finding an individual's genotype across the landscape. Genetic discontinuities are identified by calculating the mean slope for all individual probability maps and identifying areas of high mean slope as putative population boundaries (Manel et al., submitted). Note, however, that all the above spatial assignment methods perform best with a relatively continuous distribution of sample locations, such as generated by unaligned/random start systematic sampling or dense random sampling (Figure 1).

Matrix correlations

Mantel tests (Mantel, 1967) and partial Mantel tests (Smouse et al., 1986; Legendre et al., 2002) have been widely used in landscape genetic studies. For example, Keyghobadi et al. (1999) used Mantel tests to demonstrate that meadows facilitated higher gene flow among alpine butterfly (Parnassius smintheus) populations than forests. Mantel tests have also been used to explain difference in flowering time of three species of alpine meadow plants based on approximate date of snowmelt across a range of elevations (Hirao and Kudo, 2004).

Partial Mantel tests have been used to expand analyses to incorporate multiple landscape variables. For example, the relative importance of partial Mantel variables was evaluated using Akaike's Information Criterion (AIC) and showed that distance measured along drainages and age of the colony were the most important predictors of genetic distance among black-tailed prairie dog (Cynomys ludovicianus) colonies (Roach et al., 2001). In addition, Spear et al. (2005) included landscape variables (including 10 cover types) in their analysis of tiger salamander gene flow and showed that including topographic distance, rivers and shrub habitat improved the overall model fit by 50% compared to using straight-line Euclidean distance. It is noteworthy that there has been recent controversy about the statistical validity of partial Mantel tests (Raufaste and Rousset, 2001; Castellano and Balletto, 2002; Rousset, 2002), particularly given their widespread use in landscape genetic studies. However, variations of the partial Mantel test may circumvent certain bias and autocorrelation problems (Legendre, 1993, 2002).

Dispersal route analysis

Dispersal route analyses (including alternate paths, least-cost paths and some network analyses) test the correlations of alternative paths with genetic distance to infer the most likely path of gene flow based on either pre-defined paths or a priori weights along a vector or a friction surface (e.g. Michels et al., 2001; Spear et al., 2005). Alternate path approaches have included distances along rivers for aquatic or riparian associated species (Michels et al., 2001; Funk et al., 2005; Poissant et al., 2005; Vignieri, 2005) or the shortest straight-line path through suitable habitat (Pfenninger, 2002; Arnaud, 2003; Coulon et al., 2004). Least-cost models have been based on water flow for passive dispersal (Michels et al., 2001), topography and moisture (Spear et al., 2005; Vignieri, 2005) and thermal cover (Scribner et al., 2005). For example, Arnaud (2003) calculated a least-cost path between snail sampling sites that followed canal embankments considered good habitat for snail dispersal. Using Mantel correlograms, least-cost distance had a stronger correlation with genetic distance than did straight-line distance, and thus, reserve design may focus on conservation of canal embankments. These studies demonstrated that pre-defined paths and least-cost paths usually explain more variation in gene flow than straight-line Euclidean distance as estimated in traditional isolation-by-distance studies.

Autocorrelation

Typical spatial analyses of genetic variation involve quantifying genetic variation over distance (conceptually similar to isolation-by-distance) based on spatial autocorrelation such as a Moran's I statistic (Epperson and Li, 1996; Arnaud, 2003) or semivariograms (Liepelt et al., 2002; Wagner et al., 2005). A semivariogram provides a graphical description of the variance between binned observations at different distance classes (e.g. <5, 5–10 m, etc) across the landscape (Fortin and Dale, 2005). Thus, semivariogram values (in the presence of autocorrelation) are generally small when distance between observations is close, and become larger as distances increase (Fortin and Dale, 2005). For example, Wagner et al. (2005) used directional semivariograms to show distinct anisotropy (i.e. directionality) in genetic variation as influenced by mode of reproduction (clonal or sexual) and wind direction in the epiphytic lichen, Lobaria pulmonaria. Semivariogram modeling mathematically describes spatial autocorrelation, and by fitting a variogram model to the empirical semivariogram, the population variance can be estimated accounting for spatial autocorrelation. Thus, semivariogram modeling provides parameter estimates that are comparable between studies (Wagner et al., 2005).

Spatial autocorrelation approaches are also useful for defining the scale of genetic neighborhoods. Peakall et al. (2003) used spatial autocorrelation incorporating multiple microsatellite loci to identify the scale of genetic variation over a 1 km transect in bush rats (Rattus fuscipes), categorized by age and sex. They found that subadult bush rats had a larger extent of genetic similarity than adults (400 versus 200 m) and that females had higher similarity over a larger area than males (400 versus 300 m). Additional methods exist for examining effects of multiple scales, such as hierarchical analysis of variance and delineation of semivariance (Cressie, 1993; Liburne et al., 2004). These methods can be coupled with simulation techniques to examine the expected genetic neighborhood size, or testing how variability or pattern changes at different spatial scales (Gardner et al., 1987; Saura and Martínez-Millán, 2001; Wu, 2004).

Ordination

Ordination is another powerful tool that can be applied to identify the influence of continuous variables and gradients on genetic variation. Ordination is used to elucidate the pattern of the spatial arrangement of individuals using canonical correlation, principal components analysis or multidimensional scaling (Jongman et al., 1995). Canonical correspondence analysis (CCA) is a type of ordination especially suited for investigating relationships between species and their environment to effectively identify gradients in environmental variables (ter Braak and Verdonschot, 1995); space can be explicitly incorporated as a covariate in CCA (Borcard et al., 1992). CCA was applied to genetic data to determine that genetic variation of the brook charr (Salvelinus fontinalis) was influenced by drainage pattern, lake altitude, stocking history and number of fish species (Angers et al., 1999). Genetic data were incorporated in the model by constructing a matrix of allele frequencies at each locus in each lake in the study area. In a recent landscape genetic analysis of cougars (Puma concolor) (McRae et al., 2005), non-metric multidimensional scaling of shared allele distances (Bowcock et al., 1994) was used to detect a spatial discontinuity associated with desert and grassland habitats in the southwestern USA.

Landscape metrics

Landscape metrics can be applied to landscape genetics studies to quantify landscape structure in multivariate models that explore genetic variation. Landscape metrics quantify the spatial arrangement of landscape patches representing three levels: landscape, class and patch (for reviews, McGarigal and Marks, 1995; Haines-Young and Chopping, 1996). Landscape-level metrics quantify a cumulative structural measure (e.g. mean patch size) of all patches. Class metrics quantify a particular level (e.g. specific vegetation type), and patch metrics describe each individual patch (e.g. amount of edge relative to interior). Landscape metrics can be particularly powerful in metapopulation analysis where patch dynamics (e.g. connectivity and/or fragmentation) can greatly influence demography and gene flow. While methods for understanding influences of patch dynamics on ecological process have been well-developed in landscape ecology (O'Neill et al., 1988; McGarigal and Marks, 1995; Gustafson, 1998), they have been used less often in landscape genetics. However, Keyghobadi et al. (2005b) used landscape metric analysis of mean patch size and distance to edge to demonstrate that habitat fragmentation was associated with higher genetic differentiation and lower gene flow in an alpine meadow-dwelling butterfly (Parnassius smintheus) than in more contiguous landscapes.

Spatial interpolation

A relatively promising but underutilized methodology for landscape genetics is spatial interpolation, which can be particularly useful for continuously distributed species and derived data (as collected under systematic or dense random sampling designs). Interpolation techniques can be used to represent allele frequency data across a landscape surface. Interpolation provides a way to predict values and corresponding levels of uncertainty for the variable of interest between points where observations have been made. Previous studies that applied interpolation typically used mitochondrial DNA markers (Barbujani et al., 1989; Dupanloup et al., 2002), which generally lack the resolution to reveal fine scale-structure necessary for landscape genetic analysis. However, Piertney et al. (1998) interpolated a map of allelic composition by kriging the scores from the first axis of a principle component analysis of allele frequencies to examine spatial trends in a microsatellite study of genetic variation of the red grouse (Lagopus lagopus scoticus). Kriging uses a modeled semivariogram to create an interpolated continuous surface from the observed data (Isaaks and Srivastava, 1989). When a kriged principal component surface was overlaid with habitat in a GIS, barriers to gene flow were visually found to coincide with areas of unsuitable habitat.

Future directions

Future advances in this field will depend on the development of new methods for translating genetic data into a form that can be analyzed with well-developed techniques from spatial analysis and landscape ecology. There are currently several future directions that could improve our ability to better integrate landscape level analyses with population genetic data. These include: improvements in the representation of genetic data for spatial analysis, improving power for analysis of noisy ecological data, expanding available techniques to include multivariate approaches, model validation, and simulation of landscape and species distribution changes.

Representation of genetic data

It is critical to recognize that genetic data are different from ecological data typically collected for spatial analysis. In population genetics, data most often take the form of a multilocus genotype, which is not a direct measurement of a result of a spatial process (e.g. microclimate). These multilocus genotypes are only meaningful in relation to other individuals or populations, while most ecological field measurements are meaningful as independent data points (e.g. soil moisture at point y). However, spatial analysis methods typically require data to be collected as comparable point values that are geographically referenced to sampling locations on the landscape (Epperson, 2003).

If multilocus genetic data can be represented as point data, point pattern statistics are a powerful set of spatial statistics for identifying local autocorrelation structure (Anselin, 1995), clustering and spatial dependence (Diggle, 2003). For example, point pattern statistics have been used to map risk of sudden oak death according to landscape position, topography and solar insolation (Kelly and Meentemeyer, 2002). Two recent studies have demonstrated the potential for using point pattern statistics with spatially referenced genotypic data (Shimatani, 2002; Shimatani and Takahashi, 2003). For example, Shimatani (2002) used point pattern statistics to show that a secondary growth beech (Fagus crenata) forest patch contained trees from both the seed source population and the harvested population, suggesting that reproduction before harvest had provided important recruitment and genetic diversity to the population. A promising future direction would be to use point pattern statistics to identify clusters of closely related individuals and then map potential dispersal corridors based on associated landscape variables.

A second major consideration of the use of genetic data with landscape or ecological variables is that genetic variation results from both historic and contemporary processes. Therefore, current genetic variation may be more representative of processes that occurred as a result of a previous landscape configuration. If sufficient data are available, genetic data should be tested with both past and present landscapes to determine the strongest association. For example, Keyghobadi et al. (2005a) found that while genetic differentiation had the strongest correlation with contemporary habitat fragmentation, heterozygosity was better explained by fragmentation that existed 40 years in the past. These results are supported by recent work in the bush-cricket, Metrioptera roeseli, where landscape changes have occurred on a faster timescale that which resulting genetic changes can be detected (Holzhauer et al., 2006).

Noisy data

Both ecological and genetic data can have a high level of noise relative to statistical signal. One possible approach to addressing this challenge is the application of algorithmic models, powerful analysis tools for complex data sets that can incorporate numerous independent and dependent variables. Algorithmic models have been used to describe the complex relationship between environmental conditions, spatial arrangement and species abundance (Déath and Fabricius, 2000; Aitkenhead et al., 2004). Classification and Regression Trees (CART; Breiman et al., 1984) and randomforests (Breiman, 2001) are non-parametric models that partition independent variables into levels that span the range of the dependent variable, and are particularly robust with variables that are categorical or ranked (e.g. land cover type, soil type). Finally, classification tree type models are robust to overfitting (Breiman, 2001) and are capable of dealing with multiple non-linear relationships. These types of models could be applied in landscape genetics to model multiple response variables including genetic discontinuities, migration rates and relatedness.

Improving multivariate analyses

One of the difficulties with many of the currently applied techniques, such as the Mantel test or Moran's I, is the inability to employ multiple variables and interaction terms in a single model. Spatial regression models, such as autoregression, have been widely used in landscape ecology to evaluate the association between abundance and multiple environmental variables. Keitt et al. (2002) tested a variety of conditional (CAR) and simultaneous (SAR) autoregressive models, as well as exponential (ER) and Gaussian (GA) geostatistical models to account for spatial autocorrelation in species-habitat relationships. CAR and SAR models build on traditional linear models by incorporating a local spatial autocorrelation term, while ER and GA model the correlation between errors as a function of distance, thus accounting for spatial pattern. In one of their three case studies of bank vole (Clethrionomys glareolus) habitat selection among five environmental variables, vole abundance increased with vegetation height (which provided shelter from predators) and decreased with lichen and moss cover (indicative of dry habitats) (Keitt et al., 2002). Autoregressive analyses are preferable over standard regression analyses when there is significant spatial autocorrelation (Fortin and Dale, 2005), which is often the case with genetic data. Autoregressive and geostatistical models could be applied in a landscape genetic context after converting multilocus genotypes into a measure of a spatial process.

Model validation

Once sampling has occurred, genotypic data have been generated, and a model has been constructed, it is important to test the significance of the selected landscape genetic model(s). Ideally, researchers could replicate study areas and analyses when evaluating models; however, logistics may prevent replication due to the nature of ecological systems and financial and/or time constraints. One way to assess generalization of results is to implement permutation or simulation tests to discriminate whether observed relationships between landscape variables and genetic variation are not due to random chance alone (Gardner et al., 1987; Turner, 2005). Neutral landscapes are random realizations of the landscape and can be used to rule out correlations due to chance or extraneous influences (Gardner et al., 1987). Simulated neutral landscapes that follow a function of statistical similarity represented in the empirical landscape (e.g. autocorrelation) (Gardner et al., 1987; Saura, 2002) can be used to generate a null distribution and p-values can be obtained via permutation (Li and Wu, 2004). By using this approach, researchers can determine if observed relationships fall outside the 95% confidence interval of the distribution, thus suggesting statistical significance of model variable(s). Monte Carlo simulation can also be used to quantify uncertainty in a random field when using an interpolation model such as kriging or when testing the uncertainty of a particular landscape variable used in a model. The null hypothesis that all genetic observations are spatially independent of one another can be empirically rejected by bootstrapping or other randomization tests.

Predicting landscape and species distribution changes

Predicting the effects of landscape change on genetic variation is an important future application of landscape genetics. For example, one may simulate the effect of different proposed road widths or alternative forest harvest plans on spatial patterns of genetic variation for a species of concern. This type of study may allow managers to choose a management alternative that minimizes impacts on the focal species while allowing some development to take place. Using landscape change simulation models, such as LANDIS (Mladenoff, 2004), in conjunction with habitat suitability and metapopulation models, such as RAMAS (Akçakaya, 2002), researchers can model population viability across different hypothetical (or alternative proposed) landscapes. While these models require demographic data for the species of interest, genetic data could greatly inform dispersal and habitat suitability parameters of the models.

Landscape genetic studies also hold promise for application to predicting and controlling spread of invasive species (Sakai et al., 2001) or escape of transgenes (Pilson and Prendeville, 2004; Marvier and Van Acker, 2005). For example, identification of landscape barriers may allow researchers to predict locations of barriers to invasive species spread and thus the potential extent of the damage. At a broader scale, researchers could use spatial data to investigate the landscape-level distribution of invasive species to simulate possible spread. Similarly, simulation modeling may allow researchers to determine the type and size of barrier that could be implemented to contain invasive species or transgenes. For example, when planting transgenic crops, simulations could be conducted to determine the appropriate buffer size such that transgenic pollen is unlikely to escape and thus avoid potential hybridization with native species (Rieger et al., 2002).

Conclusions

The goals of this review are to: discuss landscape genetics as a discipline, suggest appropriate methods for the sampling design, and show clear benefits of utilizing well-developed methods in landscape ecology and spatial statistics for analysis. The diversity of vocabulary (e.g. Table 1), methods (e.g. Tables 2 and 3) and ideas presented in this review clearly indicate the challenges for collaborations among experts in population genetics, landscape ecology and spatial statistics. Experts working in these disciplines do not communicate regularly, and such communication is critical for advancement of landscape genetics as a discipline. The field of landscape genetics has been moving forward rapidly, with much recent literature devoted to using landscape genetics as a tool (Table 3). To advance the field, we recommend that workshops and courses be developed to foster cross-disciplinary communication and collaborations that will advance the methodology and approaches used in landscape genetics. In 2005, the International Association of Landscape Ecologists contributed greatly to this effort by holding a landscape genetics symposium at their annual meeting, and the ideas from this conference are now shared with the broader community in a special issue of Landscape Ecology (Holderegger and Wagner, 2006). We look forward to a future with increased discussion and collaboration among geneticists, landscape ecologists and spatial statisticians.