Elsevier

Geomorphology

Volume 94, Issues 3–4, 15 February 2008, Pages 438-452
Geomorphology

Predicting landslides for risk analysis — Spatial models tested by a cross-validation technique

https://doi.org/10.1016/j.geomorph.2006.12.036Get rights and content

Abstract

A landslide-hazard map is intended to show the location of future slope instability. Most spatial models of the hazard lack reliability tests of the procedures and predictions for estimating the probabilities of future landslides, thus precluding use of the maps for probabilistic risk analysis. To correct this deficiency we propose a systematic procedure comprising two analytical steps: “relative-hazard mapping” and “empirical probability estimation”. A mathematical model first generates a prediction map by dividing an area into “prediction” classes according to the relative likelihood of occurrence of future landslides, conditional by local geomorphic and topographic characteristics. The second stage estimates empirically the probability of landslide occurrence in each prediction class, by applying a cross-validation technique. Cross-validation, a “blind test” here using non-overlapping spatial or temporal subsets of mapped landslides, evaluates accuracy of the prediction and from the resulting statistics estimates occurrence probabilities of future landslides. This quantitative approach, exemplified by several experiments in an area near Lisbon, Portugal, can accommodate any subsequent analysis of landslide risk.

Introduction

Landslide-hazard maps are an essential tool for assessing landslide risk and contributing to public safety worldwide (Guzzetti et al., 1999, Glade et al., 2005). Predictive methods vary, from simple heuristic opinion-based procedures and few data to sophisticated mathematical models operating on complex databases with advanced software and hardware technology. Most of the resulting maps are based on geomorphology (Panizza et al., 1998) or soils engineering (Terlien et al., 1995, Soeters and van Westen, 1996). Predictive maps can guide land-use planning, but potential users of the information face problems of interpretation. If, for example, an area is mapped as hazardous, it may be difficult to decide on practical steps to take in that area; a ban on development would avoid damage from future landslides, but the consequences could have a serious economic impact. Decision-makers need not only relative levels of the hazard but also estimates of occurrence probabilities for future landslides at any location; landslide risk could then be estimated and some form of assessment or cost-benefit analysis performed to arrive at an informed decision.

To this end, we have further developed the two-stage approach of Chung and Fabbri, 2003, Chung and Fabbri, 2004; here we estimate the probability of occurrence of future landslides given specific scenarios and likely geomorphologic and topographic factors. The first stage creates a hazard map with a number of classes; the second stage estimates the probability that a future landslide may occur in the areas defined by each class under the scenario. To test the predictions, a small sample of shallow translational failures mapped near Lisbon, Portugal, is divided into two mutually exclusive groups by time or space. One group is used to model a prediction with several levels of hazard; counting landslides in the other group that fall inside these hazard classes yields both statistics for assessing prediction reliability and estimates of occurrence probabilities of future failures.

We modeled all predictions from calculations on a multivariate geo-referenced database compiled initially by Zêzere (1996); input data are scars (source areas, not deposits) of the mapped landslides as well as categorical (thematic) and continuous variables believed to determine the location of landslides. Several favorability function models have been developed to construct the predictions (Chung and Fabbri, 2004). The model used here is the fuzzy-set membership function based on the likelihood ratio function; problems in combining categorical and continuous data and data of different spatial resolutions or map scales are addressed by Chung (2006). This paper presents experimental results of our statistical approach to modeling landslide prediction, with emphasis on model accuracy, for several likely scenarios in which we explore the flexibility of the techniques employed, highlight operational issues, and point out caveats on their application.

Section snippets

Study area and modeling database

Our test site is the 3.8 km × 3.5 km Fanhões-Trancão area, just north of Lisbon, Portugal (Fig. 1A). The 13.3 km2 site lies entirely below 350 m elevation; its topography evolved on sedimentary and volcanic strata of the Lousa-Bucelas cuesta, in the Portuguese Meso-Cenozoic sedimentary basin close to its contact with the Tagus alluvial plain (Fig. 1B; see also Zêzere et al., 2004a, Zêzere et al., 2004b, Zêzere et al., 2004a and this volume). Although mean annual precipitation in the area is

Favorability function model for relative-hazard mapping

The necessary statistical procedures are developed here and in Section 4. Consider a pixel x in the study area with values (x1,…, xk, y1,…, yh); the first k values, x1,…, xk correspond to categorical variables and subsequent h values, y1,…, yh represent continuous variables, one index for each variable; also let Y(x) represent the presence (Y(x) = 1) or absence (Y(x) = 0) of a landslide at pixel x. Following Chung and Fabbri, 2003, Chung and Fabbri, 2004 and Chung (2006), suppose that for every

Fuzzy-set model and likelihood ratio function

To predict future landslides from the explanatory variables we divide the study area into two non-overlapping sub-areas, one containing landslide scars and the other none. To provide useful information for identifying landslide locations, the six explanatory variables in landslide areas should have ranges of values that differ significantly from those in landslide-free terrain. Frequency distributions of variables for the two areas thus should also be distinctly different; the likelihood ratio

Generating hazard classes for a prediction map

Estimating the fuzzy membership function in Eq. (5) at every pixel in the study area generates a level of hazard ranging from 0 to 1. Because these values express relative levels of hazard, they can be replaced by their ranks (or orders) instead of the actual membership-function scores. The Fanhões-Trancão area contains 532,000 pixels (760 × 700); using expression (5) with the identical function h(a) = a results in 532,000 estimated scores. These scores are first sorted in decreasing order and then

Cross-validation applied to a time-divided sample

Cross-validation (Geisser, 1974) is one of several statistical “resampling” techniques used to test the strength of a model prediction. “Validation” is used here solely as a technical term-of-art in testing for model goodness-of-fit and thus judging acceptance; no connotation of absolute “truth” in the strictest sense is intended (Sterman et al., 1994). To evaluate whether the prediction map generated is useful for recognizing the locations of future landslides in the Fanhões-Trancão area, we

Cross-validation applied to a spatial partitioning

Unexpected rainstorms can create disastrous mass movements in Portugal (Zêzere et al., 2005); given a landslide-hazard prediction map of such an area, can we extend the prediction to another area of similar geomorphic and topographic properties, although a severe storm has not yet occurred? Can we reliably identify the locations to be affected by landslides in the event of future rainstorms (SpatialModels Inc., 2004)? The cross-validation technique based on a spatial partition would be able to

Cross-validation for combined time and spatial partitionings

To more realistically evaluate a space-partitioned model, we can combine divisions by both time and space. Briefly, two mutually exclusive regions are selected as before and the landslides in each divided separately by time of occurrence; earlier landslides in the modeling region are used to build a prediction model, which is then extended to the evaluation region to create a prediction map. This map is compared with the distribution of landslides from the later period in the evaluation region

Estimating occurrence probabilities of future landslides

The ultimate goal of landslide prediction, risk of slope instability at a given location, can be computed by estimating the occurrence probability of future landslides on each pixel of a hazard class from the prediction-rate table, plus two additional parameters. One is the cumulative number of pixels up to the class (equivalently, area on the ground corresponding to all classes whose hazard levels are greater than or equal to the class); the other is the expected frequency and size of the

Discussion and prospect

We have demonstrated a two-stage approach to testing the accuracy of a spatial prediction by a cross-validation technique. A fuzzy-set model has established spatial relations between the occurrences of landslides and explanatory observations to obtain hazard classes in a prediction map. Different spatial and temporal subdivisions of the occurrences and of the study area permit a goodness-of-fit test of the predictions. Comparison of a temporal prediction, which serves as a baseline, with

Acknowledgments

This research is partly supported by the “Pathways” project under the SDKI Program (Sustainable Development through Knowledge Integration) of the Earth Sciences Sector of Natural Resources Canada. Partial support also is acknowledged for a research network project on the “Assessment of Landslide Risk and Mitigation in Mountain Areas, ALARM” (Contract EVG1-CT-2001-00038) of the European Commission's Fifth Framework Programme (http://www.spinlab.vu.nl/alarm). In addition, the authors thank

References (22)

  • KshirsagarA.M.

    Multivariate Analysis

    (1972)
  • Cited by (119)

    • Dynamic rainfall-induced landslide susceptibility: A step towards a unified forecasting system

      2023, International Journal of Applied Earth Observation and Geoinformation
    • Uncertainties of landslide susceptibility prediction considering different landslide types

      2023, Journal of Rock Mechanics and Geotechnical Engineering
    • Ensemble learning models with a Bayesian optimization algorithm for mineral prospectivity mapping

      2022, Ore Geology Reviews
      Citation Excerpt :

      The results show that the XGBoost model has a normalized density of 7.33 and its weight is 1.99, while the RF model has a normalized density of 6.14 and its weight is 1.81, indicating that the XGBoost model is superior to the RF model. The two models were further compared using success rate curves (Chung and Fabbri, 2008; Zhang et al., 2016). The results of the RF model show that the target area, which accounts for 20% of the study area, contains approximately 88% of known tin deposits (Fig. 18).

    View all citing articles on Scopus
    View full text