Predicting landslides for risk analysis — Spatial models tested by a cross-validation technique
Introduction
Landslide-hazard maps are an essential tool for assessing landslide risk and contributing to public safety worldwide (Guzzetti et al., 1999, Glade et al., 2005). Predictive methods vary, from simple heuristic opinion-based procedures and few data to sophisticated mathematical models operating on complex databases with advanced software and hardware technology. Most of the resulting maps are based on geomorphology (Panizza et al., 1998) or soils engineering (Terlien et al., 1995, Soeters and van Westen, 1996). Predictive maps can guide land-use planning, but potential users of the information face problems of interpretation. If, for example, an area is mapped as hazardous, it may be difficult to decide on practical steps to take in that area; a ban on development would avoid damage from future landslides, but the consequences could have a serious economic impact. Decision-makers need not only relative levels of the hazard but also estimates of occurrence probabilities for future landslides at any location; landslide risk could then be estimated and some form of assessment or cost-benefit analysis performed to arrive at an informed decision.
To this end, we have further developed the two-stage approach of Chung and Fabbri, 2003, Chung and Fabbri, 2004; here we estimate the probability of occurrence of future landslides given specific scenarios and likely geomorphologic and topographic factors. The first stage creates a hazard map with a number of classes; the second stage estimates the probability that a future landslide may occur in the areas defined by each class under the scenario. To test the predictions, a small sample of shallow translational failures mapped near Lisbon, Portugal, is divided into two mutually exclusive groups by time or space. One group is used to model a prediction with several levels of hazard; counting landslides in the other group that fall inside these hazard classes yields both statistics for assessing prediction reliability and estimates of occurrence probabilities of future failures.
We modeled all predictions from calculations on a multivariate geo-referenced database compiled initially by Zêzere (1996); input data are scars (source areas, not deposits) of the mapped landslides as well as categorical (thematic) and continuous variables believed to determine the location of landslides. Several favorability function models have been developed to construct the predictions (Chung and Fabbri, 2004). The model used here is the fuzzy-set membership function based on the likelihood ratio function; problems in combining categorical and continuous data and data of different spatial resolutions or map scales are addressed by Chung (2006). This paper presents experimental results of our statistical approach to modeling landslide prediction, with emphasis on model accuracy, for several likely scenarios in which we explore the flexibility of the techniques employed, highlight operational issues, and point out caveats on their application.
Section snippets
Study area and modeling database
Our test site is the 3.8 km × 3.5 km Fanhões-Trancão area, just north of Lisbon, Portugal (Fig. 1A). The 13.3 km2 site lies entirely below 350 m elevation; its topography evolved on sedimentary and volcanic strata of the Lousa-Bucelas cuesta, in the Portuguese Meso-Cenozoic sedimentary basin close to its contact with the Tagus alluvial plain (Fig. 1B; see also Zêzere et al., 2004a, Zêzere et al., 2004b, Zêzere et al., 2004a and this volume). Although mean annual precipitation in the area is
Favorability function model for relative-hazard mapping
The necessary statistical procedures are developed here and in Section 4. Consider a pixel x in the study area with values (x1,…, xk, y1,…, yh); the first k values, x1,…, xk correspond to categorical variables and subsequent h values, y1,…, yh represent continuous variables, one index for each variable; also let Y(x) represent the presence (Y(x) = 1) or absence (Y(x) = 0) of a landslide at pixel x. Following Chung and Fabbri, 2003, Chung and Fabbri, 2004 and Chung (2006), suppose that for every
Fuzzy-set model and likelihood ratio function
To predict future landslides from the explanatory variables we divide the study area into two non-overlapping sub-areas, one containing landslide scars and the other none. To provide useful information for identifying landslide locations, the six explanatory variables in landslide areas should have ranges of values that differ significantly from those in landslide-free terrain. Frequency distributions of variables for the two areas thus should also be distinctly different; the likelihood ratio
Generating hazard classes for a prediction map
Estimating the fuzzy membership function in Eq. (5) at every pixel in the study area generates a level of hazard ranging from 0 to 1. Because these values express relative levels of hazard, they can be replaced by their ranks (or orders) instead of the actual membership-function scores. The Fanhões-Trancão area contains 532,000 pixels (760 × 700); using expression (5) with the identical function h(a) = a results in 532,000 estimated scores. These scores are first sorted in decreasing order and then
Cross-validation applied to a time-divided sample
Cross-validation (Geisser, 1974) is one of several statistical “resampling” techniques used to test the strength of a model prediction. “Validation” is used here solely as a technical term-of-art in testing for model goodness-of-fit and thus judging acceptance; no connotation of absolute “truth” in the strictest sense is intended (Sterman et al., 1994). To evaluate whether the prediction map generated is useful for recognizing the locations of future landslides in the Fanhões-Trancão area, we
Cross-validation applied to a spatial partitioning
Unexpected rainstorms can create disastrous mass movements in Portugal (Zêzere et al., 2005); given a landslide-hazard prediction map of such an area, can we extend the prediction to another area of similar geomorphic and topographic properties, although a severe storm has not yet occurred? Can we reliably identify the locations to be affected by landslides in the event of future rainstorms (SpatialModels Inc., 2004)? The cross-validation technique based on a spatial partition would be able to
Cross-validation for combined time and spatial partitionings
To more realistically evaluate a space-partitioned model, we can combine divisions by both time and space. Briefly, two mutually exclusive regions are selected as before and the landslides in each divided separately by time of occurrence; earlier landslides in the modeling region are used to build a prediction model, which is then extended to the evaluation region to create a prediction map. This map is compared with the distribution of landslides from the later period in the evaluation region
Estimating occurrence probabilities of future landslides
The ultimate goal of landslide prediction, risk of slope instability at a given location, can be computed by estimating the occurrence probability of future landslides on each pixel of a hazard class from the prediction-rate table, plus two additional parameters. One is the cumulative number of pixels up to the class (equivalently, area on the ground corresponding to all classes whose hazard levels are greater than or equal to the class); the other is the expected frequency and size of the
Discussion and prospect
We have demonstrated a two-stage approach to testing the accuracy of a spatial prediction by a cross-validation technique. A fuzzy-set model has established spatial relations between the occurrences of landslides and explanatory observations to obtain hazard classes in a prediction map. Different spatial and temporal subdivisions of the occurrences and of the study area permit a goodness-of-fit test of the predictions. Comparison of a temporal prediction, which serves as a baseline, with
Acknowledgments
This research is partly supported by the “Pathways” project under the SDKI Program (Sustainable Development through Knowledge Integration) of the Earth Sciences Sector of Natural Resources Canada. Partial support also is acknowledged for a research network project on the “Assessment of Landslide Risk and Mitigation in Mountain Areas, ALARM” (Contract EVG1-CT-2001-00038) of the European Commission's Fifth Framework Programme (http://www.spinlab.vu.nl/alarm). In addition, the authors thank
References (22)
Using likelihood ratio functions for modeling the conditional probability of occurrence of future landslides for risk assessment
Computers & Geosciences
(2006)- et al.
Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy
Geomorphology
(1999) Discriminant Analysis and Applications
(1973)- et al.
Validation of spatial prediction models for landslide hazard mapping
Natural Hazards
(2003) - et al.
Systematic procedures of landslide hazard mapping for risk assessment using spatial prediction models
- et al.
A strategy for sustainable development of nonrenewable resources using spatial prediction models
Analysis of Binary Data
(1970)A predictive approach to the random effect model
Biometrika
(1974)- et al.
Discrete Distributions
(1969)
Multivariate Analysis
Cited by (119)
Regional early warning model for rainfall induced landslide based on slope unit in Chongqing, China
2024, Engineering GeologyDynamic rainfall-induced landslide susceptibility: A step towards a unified forecasting system
2023, International Journal of Applied Earth Observation and GeoinformationUncertainties of landslide susceptibility prediction considering different landslide types
2023, Journal of Rock Mechanics and Geotechnical EngineeringHydrogeochemical characterization based water resources vulnerability assessment in India's first Ramsar site of Chilka lake
2022, Marine Pollution BulletinEnsemble learning models with a Bayesian optimization algorithm for mineral prospectivity mapping
2022, Ore Geology ReviewsCitation Excerpt :The results show that the XGBoost model has a normalized density of 7.33 and its weight is 1.99, while the RF model has a normalized density of 6.14 and its weight is 1.81, indicating that the XGBoost model is superior to the RF model. The two models were further compared using success rate curves (Chung and Fabbri, 2008; Zhang et al., 2016). The results of the RF model show that the target area, which accounts for 20% of the study area, contains approximately 88% of known tin deposits (Fig. 18).