Elsevier

Ecological Informatics

Volume 9, May 2012, Pages 37-46
Ecological Informatics

Support vector machines to map rare and endangered native plants in Pacific islands forests

https://doi.org/10.1016/j.ecoinf.2012.03.003Get rights and content

Abstract

It is critical to know accurately the ecological and geographic range of rare and endangered species for biodiversity conservation and management. In this study, we used support vector machines (SVM) for modeling rare species distribution and we compared it to another emerging machine learning classifier called random forests (RF). The comparison was performed using three native and endemic plants found at low- to mid-elevation in the island of Moorea (French Polynesia, South Pacific) and considered rare because of scarce occurrence records: Lepinia taitensis (28 observed occurrences), Pouteria tahitensis (20 occurrences) and Santalum insulare var. raiateense (81 occurrences). We selected a set of biophysical variables to describe plant habitats in tropical high volcanic islands, including topographic descriptors and an overstory vegetation map. The former were extracted from a digital elevation model (DEM) and the latter is a result of a SVM classification of spectral and textural bands from very high resolution Quickbird satellite imagery. Our results show that SVM slightly but constantly outperforms RF in predicting the distribution of rare species based on the kappa coefficient and the area under the curve (AUC) achieved by both classifiers. The predicted potential habitats of the three rare species are considerably wider than their currently observed distribution ranges. We hypothesize that the causes of this discrepancy are strong anthropogenic disturbances that have impacted low- to mid-elevation forests in the past and present. There is an urgent need to set up conservation strategies for the endangered plants found in these shrinking habitats on the Pacific islands.

Highlights

► SVM paradigm is adapted to rare species distribution modeling. ► SVM can be more accurate than random forests. ► Target species rarity is a consequence of past and present human impacts.

Introduction

The detailed knowledge of rare species ecological range and geographic distribution is critical for biodiversity conservation and management (Ferrier, 2002, Rushton et al., 2004). Oceanic islands are famous for their unique biota with high endemism, but also their great vulnerability to anthropogenic disturbances (Caujapé-Castells et al., 2010, Loope et al., 1988) causing the decline of species abundance and distribution, leading sometimes to extinction (Whittaker and Fernandez-Palacios, 2007). As a result, a huge number of endangered species are currently found on island ecosystems (IUCN, 2011). Besides their conservation value, rare species may also play a key role for ecosystem functioning (Lyons and Schwartz, 2001, Lyons et al., 2005).

Occurrence records are scarce for rare species resulting in small training sample available for species distribution models (Pearson et al., 2007, Stockwell and Peterson, 2002, Wisz et al., 2008). A recent study of Williams et al. (2009) compared the ability of a range of models to predict distribution of six rare plant species (from 9 to 129 occurrences). These models included generalized linear models, artificial neural networks, the commonly used maximum entropy (Maxent) distribution and a classification and regression tree (CART) model called random forests (RF) (Breiman, 2001), the latter outperforming the former. RF, introduced by Breiman (2001), is an ensemble classifier developed to produce accurate predictions while limiting overfitting of the data. It consists of many decision trees and outputs the class that occurs most frequently in individual trees. Each input vector is used by each tree of the forest. Each tree gives a classification, and we say the tree “votes” for that class. The forest chooses the classification having the most votes over all the trees in the forest. RF has been recently and successfully used for species distribution modeling (Benito Garzon et al., 2008, Cutler et al., 2007, Prasad et al., 2006, Williams et al., 2009). RF is an easy to use classifier since it has only two parameters that the user has to determine. They are the number of trees to be used and the number of variables to be randomly selected from the available set of variables.

Nonetheless, in the field of remotely sensed data classification, a machine learning algorithm called the support vector machines (SVM) (Vapnik, 1998) may be an important technique for modeling rare species distributions. Algorithms used in remotely sensed data classification for classifying object reflectance are substantially the same than those used in species distribution models for classifying environmental layers (Franklin, 1995). Thus, SVM was successfully used for common species distribution modeling in few recent studies (Drake et al., 2006, Guo et al., 2005, Pouteau et al., 2011a).

SVM was originally introduced as a binary classifier (Vapnik, 1998) and is extensively described by Burges (1998), Hsu et al. (2009) and Schölkopf and Smola (2002). In its classical implementation, it uses two classes (e.g. presence/absence) of training samples within a multidimensional feature space to fit an optimal separating hyperplane (in each dimension, vector component is image gray-level). In this way, SVM tries to maximize the margin that is the distance between the closest training samples, or support vectors, and the hyperplane itself.

SVM consists of projecting vectors into a high dimensional feature space by means of a kernel trick then fitting the optimal hyperplane that separates classes using an optimization function. For a generic pattern x, the corresponding estimated label ŷ is given by Eq. (1).y^=signfx=sign[sumifrom 1 toNyi.αi.Kxi,x+b]

wherein N is the number of training points, the label of the ith sample is yi, b is a bias parameter, K(xi,x) is the chosen kernel and αi denotes the Lagrangian multipliers.

Several kernels are used in the literature. According to Hsu et al. (2009) and supported by many other authors, the Gaussian radial basis function (RBF) has both advantages (i) of being very successful since it works in an infinite dimensional feature space; and (ii) having a single parameter γ > 0, contrary to the other well working kernels (e.g. polynomial). The equation is Eq. (2).Kxi,x=expγxix²

Noise in the data can be accounted for by defining a distance tolerating the data scattering, thus relaxing the decision constraint. This regularization parameter is called C.

Only αi belonging to support vectors si has no null value so the classification function is actually Eq. (3).y^=signfx=sign[sumifrom 1 toPsyi.αi.Ksi,x+b]

wherein Ps is the number of support vectors. Thus, the decision boundary is solely based on few meaningful pixels. This is why SVM may be much appropriated for predicting distribution of species with scarce occurrence records. Nevertheless, to our knowledge, it has never been used for rare species distribution modeling.

The aim of this study is twofold: (i) to determine which model among RF and SVM is the most relevant to map rare species in a study case focusing on endangered native and endemic plants on Pacific islands; and (ii) comparing their predicted potential habitat with their current observed range, to understand the causes of their rarity and endangerment.

Section snippets

Target rare and endangered species

The present study was conducted on the oceanic tropical island of Moorea (Society archipelago, French Polynesia), located at 17°33′ South and 149°50′ West in the South Pacific Ocean. It is a small (ca. 140 km2) and young volcanic island (1.5–2.5 million years old) with a rough topography and the highest summit reaching 1207 m elevation.

This work was part of the “Moorea Biocode Project”, an international research program seeking to collect DNA sequence, distribution, morphological and ecological

Vegetation map

The SVM classification of the Quickbird imagery (Fig. 5) gives fairly good results with a kappa of 0.842 and an AUC of 0.965. Texture is arguably the most contributing information since the classification based on the single textural information (without spectral bands) gives a kappa of 0.821 and an AUC of 0.955 (data not shown).

Contribution of biophysical descriptors

Calculation of the descriptors relative contribution presented in Fig. 6 was based on the difference of AUC (Δ AUC) and the difference of kappa (Δ kappa) yielded with

Random forests vs. support vector machines

RF and SVM were compared on their ability to predict rare and endangered species distributions. RF was found to be optimal for predicting rare species occurrences among a wide panel of algorithms in Williams et al. (2009). To our knowledge, SVM has never been used for predicting rare species distribution. However, it generally outperforms RF in our study case, especially when the number of occurrence is small. The main reason is most likely the result of the paradigm of SVM based on a small

Conclusion

We compared two ecological niche models, random forests (RF) and support vector machines (SVM), in order to predict the distribution of rare species in island forest ecosystems. Our analysis focused on three endangered native and endemic plants on the tropical oceanic island of Moorea (French Polynesia) with small occurrence records. It was based on six fine scale environmental descriptors, namely elevation, slope steepness, slope aspect, windwardness, a compound topographic index (CTI)

Acknowledgments

The authors are grateful to Jean-François Butaud for sharing his GPS points of the target plants, Marie Fourdrigniez for her help during field surveys, the Service de l'Urbanisme of the Government of French Polynesia for providing the DEM, the Délégation à la Recherche of the Government of French Polynesia and the “Moorea Biocode Project” for financial support. We deeply thank Thomas W. Gillespie (Department of Geography, University of California, Los Angeles) for revising the English on an

References (81)

  • D.R.B. Stockwell et al.

    Effect of sample size on accuracy of species distribution models

    Ecological Modelling

    (2002)
  • W. Turner et al.

    Remote sensing for biodiversity science and conservation

    Trends in Ecology & Evolution

    (2003)
  • M.F. Augusteijn et al.

    Performance evaluation of texture measures for ground cover identification in satellite images by means of a neural network classifier

    IEEE Transactions on Geoscience and Remote Sensing

    (1995)
  • S.S. Baboo et al.

    An analysis of different resampling methods in Coimbatore, District

    Global Journal of Computer Science and Technology

    (2010)
  • Z. Baruch et al.

    Leaf construction cost, nutrient concentration, and net CO2 assimilation of native and invasive species in Hawaii

    Oecologia

    (1999)
  • J.A. Benediktsson et al.

    Classification of multisource and hyperspectral data based on decision fusion

    IEEE Transactions on Geoscience and Remote Sensing

    (1999)
  • J.A. Benediktsson et al.

    Neural network approaches versus statistical methods in classification of multisource remote sensing data

    IEEE Transactions on Geoscience and Remote Sensing

    (1990)
  • M. Benito Garzon et al.

    Effects of climate change on the distribution of Iberian tree species

    Applied Vegetation Science

    (2008)
  • L. Breiman

    Random forests

    Machine Learning

    (2001)
  • C.J.C. Burges

    A tutorial on support vector machines for pattern recognition

    Data Mining and Knowledge Discovery

    (1998)
  • S. Carlquist

    Island Biology

    (1974)
  • J. Chen et al.

    Microclimate in forest ecosystem and landscape ecology

    Bioscience

    (1999)
  • D. Chen et al.

    Examining the effect of spatial resolution and texture window size on classification accuracy: an urban environment case

    International Journal of Remote Sensing

    (2004)
  • H.T. Chu et al.

    Synergistic use of multi-temporal ALOS/PALSAR with SPOT multispectral satellite imagery for land cover mapping in the Ho Chi Minh city area, Vietnam

  • R.G. Congalton et al.

    Assessing the Accuracy of Remotely Sensed Data: Principles and Practices

    (2009)
  • D.R. Cutler et al.

    Random forests for classification in ecology

    Ecology

    (2007)
  • E.R. Delong et al.

    Comparing the areas under two or more correlated receiver operating characteristic curve: a nonparametric approach

    Biometrics

    (1988)
  • J.M. Drake et al.

    Modelling ecological niches with support vector machines

    Journal of Applied Ecology

    (2006)
  • T. Eitrich et al.

    Parallel tuning of support vector machine learning parameters for large and unbalanced data sets

  • M. Fauvel et al.

    A combined support vector machines classification based on decision fusion

  • S. Ferrier

    Mapping spatial pattern in biodiversity for regional conservation planning: where to from here?

    Systematic Biology

    (2002)
  • J. Florence et al.

    Base de données botaniques Nadeaud de l'Herbier de la Polynésie française

  • J. Franklin

    Predictive vegetation mapping: geographic modeling of biospatial patterns in relation to environmental gradients

    Progress in Physical Geography

    (1995)
  • S.E. Franklin et al.

    Spectral texture for improved class discrimination in complex terrain

    International Journal of Remote Sensing

    (1989)
  • S.E. Franklin et al.

    Incorporating texture into classification of forest species composition from airborne multispectral images

    International Journal of Remote Sensing

    (2000)
  • P.E. Gessler et al.

    Modeling soil-landscape and ecosystem properties using terrain attributes

    Soil Science Society of America Journal

    (2000)
  • M.L. Grant et al.

    Partial flora of the Society Islands: Ericaceae to Apocynaceae

  • R.M. Haralick et al.

    Textural features for image classification

    IEEE Transactions on Systems, Man, and Cybernetics

    (1973)
  • T.J. Hatton et al.

    Eagleson's optimality theory of an ecohydrological equilibrium: quo vadis?

    Functional Ecology

    (1997)
  • C.W. Hsu et al.

    A practical guide to support vector classification

  • Cited by (73)

    • Mapping habitats sensitive to overgrazing in the Swiss Northern Alps using habitat suitability modeling

      2022, Biological Conservation
      Citation Excerpt :

      Most predictors are based on the digital elevation model (DEM), calculated with different algorithms. They can be considered proxies at the local scale for temperature, insolation, runoff rate, soil water content, erosion potential, terrain morphometry, exposure to wind, soil thickness, etc. (Pouteau et al., 2012; Lannuzel et al., 2021), which all influence the vegetation (Wilson and Gallant, 2000). Other predictors were also used: an index indicating the abundance of visible stones and rock at the ground surface, the normalized difference vegetation index (NDVI) and land cover.

    • An autoencoder wavelet based deep neural network with attention mechanism for multi-step prediction of plant growth

      2021, Information Sciences
      Citation Excerpt :

      This section provides a short description of existing machine learning prediction models applied to horticulture, and in particular, to plant growth analysis, which is crucial for smart farming [47]. Data-driven models (DDM) that are used in signal processing include Machine Learning (ML) models, such as Generalized Linear Models, Artificial Neural Networks [14] and Support Vector Machines [34]. Those methods have many desirable characteristics, such as: imposing few restrictions and assumptions; ability to approximate nonlinear functions; strong predictive capabilities; flexibility to adapt to multivariate system inputs [9].

    View all citing articles on Scopus
    View full text