Elsevier

Ecological Modelling

Volume 145, Issues 2–3, 15 November 2001, Pages 111-121
Ecological Modelling

Assessing habitat-suitability models with a virtual species

https://doi.org/10.1016/S0304-3800(01)00396-9Get rights and content

Abstract

This paper compares two habitat-suitability assessing methods, the Ecological Niche Factor Analysis (ENFA) and the Generalised Linear Model (GLM), to see how well they cope with three different scenarios. The main difference between these two analyses is that GLM is based on species presence/absence data while ENFA on presence data only. A virtual species was created and then dispatched in a geographic information system model of a real landscape following three historic scenarios: (1) spreading, (2) at equilibrium, and (3) overabundant species. In each situation, the virtual species was sampled and these simulated data sets were used as input for the ENFA and GLM to reconstruct the habitat suitability model. The results showed that ENFA is very robust to the quality and quantity of the data, giving good results in the three scenarios. GLM was badly affected in the case of the spreading species but produced slightly better results than ENFA when the species was overabundant; at equilibrium, both methods produced equivalent results. The use of a virtual species proved to be a very efficient method, allowing one to fully control the quality of the input data as well as to accurately evaluate the predictive power of both analyses.

Introduction

Prediction of species distribution is an important element of conservation biology. Management for endangered species (Palma et al., 1999, Sanchez-Zapata and Calvo, 1999), ecosystem restoration (Mladenoff et al., 1997), species re-introductions (Breitenmoser et al., 1999), population viability analyses (Akçakaya et al., 1995, Akçakaya and Atwood, 1997) and human–wildlife conflicts (Le Lay et al., 2001) often rely on habitat-suitability modelling. Multivariate models are commonly used to define habitat suitability and, combined with geographical information systems (GIS), allow one to create potential distribution maps (Guisan and Zimmermann, 2000).

Numerous multivariate analyses were developed for building habitat suitability or abundance models, but very few studies compare their predictive power (for example, Lek et al., 1996, Paruelo and Tomasel, 1997, Guisan et al., 1999, Manel et al., 1999, Özesmi and Özesmi, 1999). In this paper, we compare a common method, the Generalised Linear Model (GLM) (for example, Austin et al., 1984, Augustin et al., 1996, Guisan et al., 1998), with the Ecological Niche Factor Analysis (ENFA), a new multivariate analysis (Hirzel et al., in press).

GLM is a generalisation of multiple regression analysis with a binomial distribution and logistic link that may fit polynomials of higher degree than linear. The dependent variable (presence/absence of the species) is explained by a sum of weighted ecogeographical predictors. The weights are tuned in order to generate the best fit between the model and the calibration data set (Jongman et al., 1987, Nicholls, 1989).

ENFA compares the ecogeographical predictor distribution for a presence data set consisting of locations where the species has been detected with the predictor distribution of the whole area. Like the Principal Component Analysis, ENFA summarises all predictors into a few uncorrelated factors retaining most of the information. But in this case, the factors have an ecological meaning: the first factor is the ‘marginality’, and reflects the direction in which the species niche mostly differs from the available conditions in the global area. Subsequent factors represent the ‘specialisation’. They are extracted successively by computing the direction that maximises the ratio of the variance of the global distribution to that of the species distribution. A large part of the information is accounted for by a few of the first factors. The species distribution on these factors is used to compute a habitat suitability index for any set of descriptor values (Hirzel et al., in press).

Practically, the main difference between these analyses is the quality of input data: GLM needs presence/absence data, whereas ENFA only needs presence data. The latter is thus much less demanding than the former and it is interesting to compare their predictive power. Obviously, this power depends on the situation: for example, when absence data are reliable GLM could get extra power by using this information, but in other situations it could be misled by false absences (McArdle, 1990, Solow, 1993; see also ‘stochastic zeros’ in Welsh et al., 1996).

The goal of this paper is thus to circumscribe the domain of application of both methods from the point of view of absence data quality. It is more complex a task than simply comparing analyses on the same data set. Indeed, measuring their sensitivity to various data qualities entails exploring several distribution patterns of ecologically identical species in a common landscape. But as such species could not live simultaneously in a same place, it is impossible to find such data in the real world; it is therefore necessary to generate simulated species distribution data. Moreover, this method presents the following advantages: (1) the input data set can be fully controlled, qualitatively as well as quantitatively; and (2) the ‘reality’ being perfectly known, model accuracy assessment is straightforward and certain. Nevertheless, in order to track reality as closely as possible, the environmental predictors were taken from a real area in the Swiss Alps.

Section snippets

Methods

This study implied to build a virtual species completely characterised by its ecological niche, which would be modelled by a ‘truth’ habitat suitability map. Three data sets were then generated, simulating three different scenarios. These data sets, in conjunction with environmental variables, were fed into the GLM and ENFA analyses, which produced ‘predicted’ habitat suitability maps. Finally, resulting models were evaluated by statistically comparing each ‘predicted’ map with the ‘truth’ map.

Results

Equilibrium and overabundance scenarios were addressed with two sample sizes (300 and 1200 points) for both analyses (ENFA and GLM) and the spreading scenario only with 300 sample points, which makes a total of ten habitat suitability maps.

In order to compare the predictive power of these ‘result’ maps, the proportion of explained variance (R2) was computed on a sample of 250 pairs of points taken in the ‘result’ map and in the ‘truth’ map. This coefficient was computed ten times for each map

Discussion

The three addressed scenarios were modelled with unequal success by the two analyses. The ENFA appeared to be very robust to data quality and quantity, none of the investigated cases presents a significantly better or poorer fit; the overall goodness of fit was good with an average explained variance proportion of 0.58 (S.D.=0.02). On the other hand, GLM was moderately sensitive to data quality but not to data quantity (average explained variance, 0.52; S.D.=0.11).

Relying on absence data is

Conclusions

  • 1.

    This paper gives insights on the domains of application of GLM and ENFA. It appears that the robustness of ENFA makes it particularly suitable and efficient when the quality of data is either poor (the absence data are unreliable) or unknown. The GLM offers slightly better results when the available presence/absence data are sufficiently good.

  • 2.

    Virtual species simulation proved to be useful when assessing analysis predictive power in spatial ecology, allowing one to achieve a more accurate

Acknowledgements

This work was funded by the Swiss Federal Office for Environment, Forest and Landscape (OFEFP) (grant 0310.3600.305) and by the Laboratory of Conservation Biology, Institute of Ecology, University of Lausanne. The authors wish to thank Karine Fattebert and Antoine Guisan for their kind help with s-plus use and GLM methods. Pierre Dutilleul's and Jerôme Goudet's assistance during our struggle against spatial autocorrelation was also greatly appreciated. Many thanks to Sebastien Sachot, Antoine

References (37)

  • H.R. Akçakaya et al.

    A habitat-based metapopulation model of the California Gnatcatcher

    Conservat. Biol.

    (1997)
  • H.R. Akçakaya et al.

    Linking landscape data with population viability analysis: management options for the helmeted honeyeater Lichenostomus melanops cassidix

    Biol. Conservat.

    (1995)
  • J.R. Alldredge et al.

    Further comparison of some statistical techniques for analysis of resource selection

    J. Wildl. Manag.

    (1992)
  • N.H. Augustin et al.

    An autologistic model for the spatial distribution of wildlife

    J. Appl. Ecol.

    (1996)
  • M.P. Austin et al.

    New approaches to direct gradient analysing using environmental scalars and statistical curve-fitting procedures

    Vegetatio

    (1984)
  • U. Breitenmoser et al.

    Beurteilung des Kantons St.Gallen als Habitat für den Luchs

    (1999)
  • P. Clifford et al.

    Assessing the significance of the correlation between two spatial processes

    Biometrics

    (1989)
  • P. Dutilleul

    Modifying the t-test for assessing the correlation between two spatial processes

    Biometrics

    (1993)
  • Cited by (398)

    View all citing articles on Scopus
    View full text