Abstract
Wetland ecosystems are of primary concern for nature conservation and restoration. Adequate conservation and restoration strategies emerge from a scientific comprehension of wetland properties and processes. Hereby, the understanding of plant species and vegetation patterns in relation to environmental gradients is an important issue. The modelling approaches in this study statistically relate vegetation patterns to measured environmental gradients in a lowland wetland ecosystem. Measured environmental gradients included groundwater quantity and quality aspects, soil properties and vegetation management. Among this variety, the objective was to identify the key environmental gradients constraining the vegetation, using recently developed methodologies within the modelling approaches. Comparison of results indicated that different environmental gradients were considered to be important by different methodologies.
Similar content being viewed by others
Abbreviations
- MLR:
-
Multiple logistic regression
- RF:
-
Random forest
References
Akaike H (1974) A new look at the statistical model identification. IEEE Trans Autom Control 19(6):716–723
Araújo MB, News M (2007) Ensemble forecasting of species distributions. Trends Ecol Evol 22(1):42–47
Archer KJ, Kimes RV (2008) Empirical characterization of random forest variable importance measures. Comput Stat Data Anal 52:2249–2260
Austin MP (2002) Spatial prediction of species distribution: an interface between ecological theory and statistical modeling. Ecol Model 157(2–3):101–118
Barendregt A, Wassen MJ, Smidt JTD (1993) Hydroecological modelling in a polder landscape: a tool for wetland management. In: Vos CC, Opdam P (eds) Landscape ecology of a stressed environment. Chapman and Hall, London
Bio AMF, De Becker P, De Bie E, Huybrechts W, Wassen M (2002) Prediction of plant species distribution in lowland river valleys in Belgium: modelling species response to site conditions. Biodivers Conserv 11:2189–2216
Breiman L (2001) Random forests. Mach Learn 45:5–32
Breiman L, Cutler A (2005) http://www.stat./berkeley.edu/users/Breiman/RandomForests/cc_papers.htm
Breiman L, Friedman JH, Olshen RA, Stone CJ (1984) Classification and regression trees. Chapman and Hall, New York
Burnham KP, Anderson DR (2002) Model selection and multimodel inference: a practical information-theoretic approach. Springer, New York
Chevan A, Sutherland M (1991) Hierarchical partitioning. Am Stat 45(2):90–96
Cohen J (1960) A coefficient of agreement for nominal scales. Educ Psychol Measure 20:37–46
De Becker P, Huybrechts W (2000) De Doode Bemde—Ecohydrologische Atlas. Institute of Nature Conservation, Brussels (in Dutch)
De Becker P, Hermy M, Butaye J (1999) Ecohydrological characterisation of a groundwater-fed alluvial floodplane mire. Appl Veg Sci 2:215–228
Díaz-Uriarte R, de Andrés SA (2006) Gene selection and classification of microarray data using random forest. BMC Bioinformatics 7(3). doi:10.1186/1471-2105-7-3
Ertsen ACD, Frens JW, Nieuwenhuis JW, Wassen MJ (1995) An approach to modelling the relationship between plant species and site conditions in terrestrial ecosystems. Landsc Urban Plan 31:143–151
Everitt BS (1992) The analysis of contingency tables, 2nd edn. Chapman and Hall, London
Fleisman E, Mac Nally R, Murphy DD (2005) Relationships among non-native plants, diversity of plants and butterflies, and adequacy of spatial sampling. Biol J Linn Soc 85:157–166
Guisan A, Thuiller W (2005) Predicting species distribution: offering more than simple habitat models. Ecol Lett 8:993–1009
Guisan A, Zimmerman NE (2000) Predictive habitat distribution models in ecology. Ecol Model 135(2–3):147–186
Hansen L, Salamon P (1990) Neural network ensembles. IEEE Trans Pattern Anal Mach Intell 12:993–1001
Hastie T, Tibshirani R (1990) Generalized additive models. Chapman and Hall, London
Hill MO (1979) TWINSPAN—a FORTRAN program for arranging multivariate data in an ordered two-way table by classification of the individuals and attributes. Cornell University, Ithaca
Hosmer DW, Lemeshow S (2000) Applied logistic regression, 2nd edn. Wiley, Chichester
Huybrechts W, De Bie E, De Becker P, Wassen M, Bio A (2002) Ontwikkeling van een hydro-ecologisch model voor vallei-ecosystemen in Vlaanderen, ITORS-VL (VLINA 00/16). Instituut voor Natuurbehoud, Brussel (In Dutch)
Kadlec RH, Knight RL (1996) Treatment wetlands. Lewis Publishers, Boca Raton
Legendre P, Legendre L (1998) Numerical ecology, 2nd edn. Elsevier Science, Amsterdam
Liaw A, Wiener M (2002) Classification and regression by random forest. R News 2/3:18–22
Londo G (1988) Nederlandse Freatophyten. Pudoc, Wageningen (in Dutch)
Mac Nally R (2000) Regression and model-building in conservation biology, biogeography, and ecology: the distinction between—and reconciliation of—‘predictive’ and ‘explanatory’ models. Biodivers Conserv 9:655–671
Mac Nally R (2002) Multiple regression and inference in ecology and conservation biology: further comments on identifying important predictor variables. Biodivers Conserv 11:1397–1401
Mitsch WJ, Gosselink JG (2000) Wetlands, 3rd edn. Wiley, New York
Neter J, Kutner MH, Nachtsheim CJ, Wasserman W (1996) Applied linear statistical models, 4th edn. WCB McGraw-Hill, United States
Noest V (1994) A hydrology-vegetation interaction model for predicting the occurrence of plant species in dune slacks. J Environ Manage 40:119–128
Özesmi SL, Tan CO, Özesmi U (2006) Methodological issues in building, training, and testing artificial neural networks in ecological applications. Ecol Model 195(1–2):83–93
Peters J, De Baets B, Verhoest NEC, Samson R, Degroeve S, De Becker P, Huybrechts W (2007) Random forests as a tool for predictive ecohydrological modelling. Ecol Model 207(2–4):304–318
Peters J, De Baets B, Samson R, Verhoest NEC (2008) Modelling groundwater-dependent vegetation patterns using ensemble learning. Hydrol Earth Syst Sci 12:603–613
Prasad AM, Iverson LR, Liaw A (2006) Newer classification and regression tree techniques: bagging and random forests for ecological prediction. Ecosystems 9:181–199
Rushton SP, Ormerod SJ, Kerby G (2004) New paradigms for modeling species distributions? J Appl Ecol 41:193–200
Strobl C, Boulesteix A-L, Zeileis A, Hothorn T (2007) Bias in random forest variable importance measures: illustrations, sources and a solution. BMC Bioinfo 8(25). doi:10.1186/1471-2105-8-25
ter Braak CFJ (1986) Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology 67:1167–1179
Vaughan IP, Ormerod SJ (2005) Increasing the value of principle components analysis for simplifying ecological data: a case study with rivers and river birds. J Appl Ecol 42:487–497
Walsh C, Mac Nally R (2005) http://www.cran.r-project.org/doc/packages/hier.part.pdf
Yee TW, Mitchell ND (1991) Generalized additive models in plant ecology. J Veg Sci 2:587–602
Acknowledgements
The authors wish to thank the special research fund (BOF, project nr 011/015/04) of Ghent University, and the Fund for Scientific Research-Flanders (operating and equipment grant 1.5.108.03). We are grateful to Willy Huybrechts and Piet De Becker from the Institute of Nature Conservation, Belgium, for providing the data gathered through the Flemish Research Programme on Nature Development (projects VLINA 96/03 and VLINA 00/16), and to Rudi Hoeben for computer assistance.
Author information
Authors and Affiliations
Corresponding author
Appendix A: Variable importance measure
Appendix A: Variable importance measure
Algorithm for estimating the variable importance of each predictive variable within random forests is given by:
-
(i)
For i = 1 to k do (grow a random forest consisting of k classification trees):
-
(1)
apply tree i to the n oob elements and count the number of correct classifications over the n oob elements (C i,untouched);
-
(2)
for j = 1 to p (with p the total number of variables) do:
-
(a)
take the n untouched oob elements;
-
(b)
randomly permute the values of variable j in the n oob elements;
-
(c)
apply tree i to all the j permuted oob elements;
-
(d)
count the number of correct classifications (C i,j-permuted);
-
(e)
subtract the number of correct classifications of the variable-j-permuted oob elements from the number of correct classifications of the untouched oob elements and divide by the number of oob elements (ΔC i,j = (C i,untouched − C i,j-permuted)/n);
-
(a)
-
(1)
The results from these iterations are p (number of variables, j = 1 to p) groups of k (number of trees, i = 1 to k) ΔC i,j values. Since trees are independent, correlations among the ΔC i,j values within the p groups are generally low. Finally:
-
(ii)
For each of the j = 1 to p groups, the mean ΔC i,j over all i = 1 to k trees is calculated \((\overline{\Updelta C_j}=\sum^{k}_{i=1}{C_{i,j}/{k}}).\) The value \(\overline{\Updelta C_j}\times 100\) is referred to as the ‘mean importance score’ of variable j. The value is positive when C i,untouched > C i,j-permuted and negative when C i,untouched < C i,j-permuted. Mean importance scores have high values when the classification error increases by permuting the values of variable p.
-
(iii)
Since correlations of the ΔC i,j scores are generally low within the j = 1 to p groups, standard errors can be calculated for each of the j groups of i = 1 to k ΔC i,j scores. Divide \(\overline{\Updelta C_j}\) by the standard error (se) to obtain a z-score for variable j, and assign a significance level assuming normality.
Rights and permissions
About this article
Cite this article
Peters, J., Verhoest, N.E.C., Samson, R. et al. Wetland vegetation distribution modelling for the identification of constraining environmental variables. Landscape Ecol 23, 1049–1065 (2008). https://doi.org/10.1007/s10980-008-9261-4
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10980-008-9261-4