Predicting landslides for risk analysis — Spatial models tested by a cross-validation technique

doi:10.1016/j.geomorph.2006.12.036

Geomorphology

Volume 94, Issues 3–4, 15 February 2008, Pages 438-452

https://doi.org/10.1016/j.geomorph.2006.12.036 Get rights and content

Abstract

A landslide-hazard map is intended to show the location of future slope instability. Most spatial models of the hazard lack reliability tests of the procedures and predictions for estimating the probabilities of future landslides, thus precluding use of the maps for probabilistic risk analysis. To correct this deficiency we propose a systematic procedure comprising two analytical steps: “relative-hazard mapping” and “empirical probability estimation”. A mathematical model first generates a prediction map by dividing an area into “prediction” classes according to the relative likelihood of occurrence of future landslides, conditional by local geomorphic and topographic characteristics. The second stage estimates empirically the probability of landslide occurrence in each prediction class, by applying a cross-validation technique. Cross-validation, a “blind test” here using non-overlapping spatial or temporal subsets of mapped landslides, evaluates accuracy of the prediction and from the resulting statistics estimates occurrence probabilities of future landslides. This quantitative approach, exemplified by several experiments in an area near Lisbon, Portugal, can accommodate any subsequent analysis of landslide risk.

Introduction

Landslide-hazard maps are an essential tool for assessing landslide risk and contributing to public safety worldwide (Guzzetti et al., 1999, Glade et al., 2005). Predictive methods vary, from simple heuristic opinion-based procedures and few data to sophisticated mathematical models operating on complex databases with advanced software and hardware technology. Most of the resulting maps are based on geomorphology (Panizza et al., 1998) or soils engineering (Terlien et al., 1995, Soeters and van Westen, 1996). Predictive maps can guide land-use planning, but potential users of the information face problems of interpretation. If, for example, an area is mapped as hazardous, it may be difficult to decide on practical steps to take in that area; a ban on development would avoid damage from future landslides, but the consequences could have a serious economic impact. Decision-makers need not only relative levels of the hazard but also estimates of occurrence probabilities for future landslides at any location; landslide risk could then be estimated and some form of assessment or cost-benefit analysis performed to arrive at an informed decision.

To this end, we have further developed the two-stage approach of Chung and Fabbri, 2003, Chung and Fabbri, 2004; here we estimate the probability of occurrence of future landslides given specific scenarios and likely geomorphologic and topographic factors. The first stage creates a hazard map with a number of classes; the second stage estimates the probability that a future landslide may occur in the areas defined by each class under the scenario. To test the predictions, a small sample of shallow translational failures mapped near Lisbon, Portugal, is divided into two mutually exclusive groups by time or space. One group is used to model a prediction with several levels of hazard; counting landslides in the other group that fall inside these hazard classes yields both statistics for assessing prediction reliability and estimates of occurrence probabilities of future failures.

We modeled all predictions from calculations on a multivariate geo-referenced database compiled initially by Zêzere (1996); input data are scars (source areas, not deposits) of the mapped landslides as well as categorical (thematic) and continuous variables believed to determine the location of landslides. Several favorability function models have been developed to construct the predictions (Chung and Fabbri, 2004). The model used here is the fuzzy-set membership function based on the likelihood ratio function; problems in combining categorical and continuous data and data of different spatial resolutions or map scales are addressed by Chung (2006). This paper presents experimental results of our statistical approach to modeling landslide prediction, with emphasis on model accuracy, for several likely scenarios in which we explore the flexibility of the techniques employed, highlight operational issues, and point out caveats on their application.

Section snippets

Study area and modeling database

Our test site is the 3.8 km × 3.5 km Fanhões-Trancão area, just north of Lisbon, Portugal (Fig. 1A). The 13.3 km² site lies entirely below 350 m elevation; its topography evolved on sedimentary and volcanic strata of the Lousa-Bucelas cuesta, in the Portuguese Meso-Cenozoic sedimentary basin close to its contact with the Tagus alluvial plain (Fig. 1B; see also Zêzere et al., 2004a, Zêzere et al., 2004b, Zêzere et al., 2004a and this volume). Although mean annual precipitation in the area is

Favorability function model for relative-hazard mapping

The necessary statistical procedures are developed here and in Section 4. Consider a pixel x in the study area with values (x₁,…, x_k, y₁,…, y_h); the first k values, x₁,…, x_k correspond to categorical variables and subsequent h values, y₁,…, y_h represent continuous variables, one index for each variable; also let Y(x) represent the presence (Y(x) = 1) or absence (Y(x) = 0) of a landslide at pixel x. Following Chung and Fabbri, 2003, Chung and Fabbri, 2004 and Chung (2006), suppose that for every

Fuzzy-set model and likelihood ratio function

To predict future landslides from the explanatory variables we divide the study area into two non-overlapping sub-areas, one containing landslide scars and the other none. To provide useful information for identifying landslide locations, the six explanatory variables in landslide areas should have ranges of values that differ significantly from those in landslide-free terrain. Frequency distributions of variables for the two areas thus should also be distinctly different; the likelihood ratio

Generating hazard classes for a prediction map

Estimating the fuzzy membership function in Eq. (5) at every pixel in the study area generates a level of hazard ranging from 0 to 1. Because these values express relative levels of hazard, they can be replaced by their ranks (or orders) instead of the actual membership-function scores. The Fanhões-Trancão area contains 532,000 pixels (760 × 700); using expression (5) with the identical function h(a) = a results in 532,000 estimated scores. These scores are first sorted in decreasing order and then

Cross-validation applied to a time-divided sample

Cross-validation (Geisser, 1974) is one of several statistical “resampling” techniques used to test the strength of a model prediction. “Validation” is used here solely as a technical term-of-art in testing for model goodness-of-fit and thus judging acceptance; no connotation of absolute “truth” in the strictest sense is intended (Sterman et al., 1994). To evaluate whether the prediction map generated is useful for recognizing the locations of future landslides in the Fanhões-Trancão area, we

Cross-validation applied to a spatial partitioning

Unexpected rainstorms can create disastrous mass movements in Portugal (Zêzere et al., 2005); given a landslide-hazard prediction map of such an area, can we extend the prediction to another area of similar geomorphic and topographic properties, although a severe storm has not yet occurred? Can we reliably identify the locations to be affected by landslides in the event of future rainstorms (SpatialModels Inc., 2004)? The cross-validation technique based on a spatial partition would be able to

Cross-validation for combined time and spatial partitionings

To more realistically evaluate a space-partitioned model, we can combine divisions by both time and space. Briefly, two mutually exclusive regions are selected as before and the landslides in each divided separately by time of occurrence; earlier landslides in the modeling region are used to build a prediction model, which is then extended to the evaluation region to create a prediction map. This map is compared with the distribution of landslides from the later period in the evaluation region

Estimating occurrence probabilities of future landslides

The ultimate goal of landslide prediction, risk of slope instability at a given location, can be computed by estimating the occurrence probability of future landslides on each pixel of a hazard class from the prediction-rate table, plus two additional parameters. One is the cumulative number of pixels up to the class (equivalently, area on the ground corresponding to all classes whose hazard levels are greater than or equal to the class); the other is the expected frequency and size of the

Discussion and prospect

We have demonstrated a two-stage approach to testing the accuracy of a spatial prediction by a cross-validation technique. A fuzzy-set model has established spatial relations between the occurrences of landslides and explanatory observations to obtain hazard classes in a prediction map. Different spatial and temporal subdivisions of the occurrences and of the study area permit a goodness-of-fit test of the predictions. Comparison of a temporal prediction, which serves as a baseline, with

Acknowledgments

This research is partly supported by the “Pathways” project under the SDKI Program (Sustainable Development through Knowledge Integration) of the Earth Sciences Sector of Natural Resources Canada. Partial support also is acknowledged for a research network project on the “Assessment of Landslide Risk and Mitigation in Mountain Areas, ALARM” (Contract EVG1-CT-2001-00038) of the European Commission's Fifth Framework Programme (http://www.spinlab.vu.nl/alarm). In addition, the authors thank

References (22)

ChungC.F.
Using likelihood ratio functions for modeling the conditional probability of occurrence of future landslides for risk assessment
Computers & Geosciences
(2006)
GuzzettiF. et al.
Landslide hazard evaluation: a review of current techniques and their application in a multi-scale study, Central Italy
Geomorphology
(1999)
CacoullosT.
Discriminant Analysis and Applications
(1973)
ChungC.F. et al.
Validation of spatial prediction models for landslide hazard mapping
Natural Hazards
(2003)
ChungC.F. et al.
Systematic procedures of landslide hazard mapping for risk assessment using spatial prediction models
ChungC.F. et al.
A strategy for sustainable development of nonrenewable resources using spatial prediction models
CoxD.R.
Analysis of Binary Data
(1970)
GeisserS.
A predictive approach to the random effect model
Biometrika
(1974)
JohnsonN.L. et al.
Discrete Distributions
(1969)

KshirsagarA.M.

Multivariate Analysis

(1972)

Cited by (119)

Regional early warning model for rainfall induced landslide based on slope unit in Chongqing, China
2024, Engineering Geology
Recent advances in the diversity and systematization of design methods and real-time data have led to a general elevation in spatio-temporal accuracy for regional landslide early-warning (LEW). However, the heterogeneity of the geo-environment and the differences in landslide mechanisms are always neglected in the LEW models, which reduce the precision of LEW systems. This study proposes a slope-unit (SU) based regional LEW model for forecasting the real-time probability of rainfall-induced landslides, combing landslide susceptibility assessment and rainfall threshold modeling, taking Chongqing, China as the study case. The SU is adopted to discretize the study area concerning the concurrent occurrence of rainfall-induced shallow and deep-seated landslides, in view of the limitations of grid cells, which are more appropriate for shallow landslides with homogeneous materials and structures. In addition, four distinct subregions are identified based on the geo-environmental heterogeneity of the study area. For each subregion, specific landslide susceptibility models and rainfall thresholds are developed to account for the different landslide mechanisms. Landslide susceptibility maps (LSM) integrate data-driven methods with the latest 1:50,000 field surveys to achieve accurate predictions of future landslides. Rainfall threshold models are constructed based on a correlation analysis of 2142 landslides and associated historical rainfall events. By using 9-day antecedent rainfall records from 2103 rain gauges and numerical rainfall forecast products for the next 24 h as input data, the LEW model can dynamically release warning information. To validate the performance of the LEW model, the consecutive daily warnings for two rainfall events that induced groups of landslides were retrieved. The results demonstrated an overall satisfactory warning effect, with over 70% of the total rainfall-induced landslides exceeding the yellow alert warning level and a low rate of miss-alarms (<15%). It indicated that the slope unit partition based on the characteristics of rainfall-induced landslides and region division according to geological heterogeneity could effectively contribute to accurate LEW, especially over large areas. Furthermore, the findings revealed that early warnings of landslides induced by persistent rainfall over large area are more prone to generate false or miss alarms compared to local concentrated rainstorms. The LEW framework proposed in this study is expected to provide valuable technical support to the local authorities in effective landslide risk mitigation in a time-efficient manner.
Dynamic rainfall-induced landslide susceptibility: A step towards a unified forecasting system
2023, International Journal of Applied Earth Observation and Geoinformation
The initial inception of the landslide susceptibility concept defined it as a static property of the landscape, explaining the proneness of certain locations to generate slope failures. Since the spread of data-driven probabilistic solutions though, the original susceptibility definition has been challenged to incorporate dynamic elements that would lead the occurrence probability to change both in space and in time. This is the starting point of this work, which combines the traditional strengths of the susceptibility framework together with the strengths typical of landslide early warning systems. Specifically, we model landslide occurrences in the norther sector of Vietnam, using a multi-temporal landslide inventory recently released by NASA. A set of static (terrain) and dynamic (cumulated rainfall) covariates are selected to explain the landslide presence/absence distribution via a Bayesian version of a binomial Generalized Additive Models (GAM). Thanks to the large spatiotemporal domain under consideration, we include a large suite of cross-validation routines, testing the landslide prediction through random sampling, as well as through stratified spatial and temporal sampling. We even extend the model test towards regions far away from the study site, to be used as external validation datasets. The overall performance appears to be quite high, with Area Under the Curve (AUC) values in the range of excellent model results, and very few localized exceptions.
This model structure may serve as the basis for a new generation of early warning systems. However, the use of The Climate Hazards group Infrared Precipitation with Stations (CHIRPS) for the rainfall component limits the model ability in terms of future prediction. Therefore, we envision subsequent development to take this direction and move towards a unified dynamic landslide forecast. Ultimately, as a proof-of-concept, we have also implemented a potential early warning system in Google Earth Engine.
Uncertainties of landslide susceptibility prediction considering different landslide types
2023, Journal of Rock Mechanics and Geotechnical Engineering
Most literature related to landslide susceptibility prediction only considers a single type of landslide, such as colluvial landslide, rock fall or debris flow, rather than different landslide types, which greatly affects susceptibility prediction performance. To construct efficient susceptibility prediction considering different landslide types, Huichang County in China is taken as example. Firstly, 105 rock falls, 350 colluvial landslides and 11 related environmental factors are identified. Then four machine learning models, namely logistic regression, multi-layer perception, support vector machine and C5.0 decision tree are applied for susceptibility modeling of rock fall and colluvial landslide. Thirdly, three different landslide susceptibility prediction (LSP) models considering landslide types based on C5.0 decision tree with excellent performance are constructed to generate final landslide susceptibility: (i) united method, which combines all landslide types directly; (ii) probability statistical method, which couples analyses of susceptibility indices under different landslide types based on probability formula; and (iii) maximum comparison method, which selects the maximum susceptibility index through comparing the predicted susceptibility indices under different types of landslides. Finally, uncertainties of landslide susceptibility are assessed by prediction accuracy, mean value and standard deviation. It is concluded that LSP results of the three coupled models considering landslide types basically conform to the spatial occurrence patterns of landslides in Huichang County. The united method has the best susceptibility prediction performance, followed by the probability method and maximum susceptibility method. More cases are needed to verify this result in-depth. LSP considering different landslide types is superior to that taking only a single type of landslide into account.
Hydrogeochemical characterization based water resources vulnerability assessment in India's first Ramsar site of Chilka lake
2022, Marine Pollution Bulletin
A limnological site is significantly characterized by rich biological, chemical, and physical properties of the environment and is also described as the epitome of a large aquatic ecosystem. During the last few decades, the Chilka lake Ramsar site has experienced substantial degradation of water quality with associated deterioration of aquatic biodiversity. Our study aims to quantify the VWRM of the Chilka lake Ramsar region using the most reliable MLAs, namely ANN and RF, with the help of seventeen hydro-chemical properties of lake water. The produced map is validated through six validating measures (ROC-AUC- 0.89, Sensitivity-0.90, Specificity-0.78, PPV-0.78, NPV-0.88, Taylor diagram (r)-0.94), which depict that ANN is the most reliable ML algorithm in assessing the VWRM of the concerned region followed by RF. The prepared map of our study revealed that the eastern part was remarkably high to very high vulnerable zone covered area with 22.41 % and 7.19 %, respectively.
Ensemble learning models with a Bayesian optimization algorithm for mineral prospectivity mapping
2022, Ore Geology Reviews
Citation Excerpt :
The results show that the XGBoost model has a normalized density of 7.33 and its weight is 1.99, while the RF model has a normalized density of 6.14 and its weight is 1.81, indicating that the XGBoost model is superior to the RF model. The two models were further compared using success rate curves (Chung and Fabbri, 2008; Zhang et al., 2016). The results of the RF model show that the target area, which accounts for 20% of the study area, contains approximately 88% of known tin deposits (Fig. 18).
Machine learning algorithms have been widely applied in mineral prospectivity mapping (MPM). In this study, we implemented ensemble learning of extreme gradient boosting (XGBoost) and random forest (RF) models to create MPM for magmatic hydrothermal tin polymetallic deposits in Xianghualing District, southern Hunan Province, China. Machine-learning models often require careful adjustment of the learning parameters and model hyperparameters for optimal global performance. However, parameter tuning often entails tedious calculations and sufficient expert experience, which is a time-consuming and labor-intensive process. To obtain the global optimal performance of the XGBoost and RF models, a Bayesian optimization algorithm (BOA) was employed with the aid of 5-fold cross validation to search for the most appropriate hyperparameters of the XGBoost and RF models. After the Bayesian optimization, the AUC values of both models were significantly improved, indicating that the BOA is a powerful optimization tool. The optimization results provide a reference for the empirical hyperparameter setting of ensemble learning models. Through a comparative study, the XGBoost model was shown to be superior to the RF model in terms of accuracy, precision, recall, F1 score, and kappa coefficient. In addition, the receiver operating characteristic curves and prediction–area curves showed that the XGBoost model outperformed the RF model, indicating that the XGBoost model had better prediction ability and stability in the case area. In this study, the XGBoost model shows great potential for MPM, offering a significant improvement over the BOA method.
An artificial neural network model to predict debris-flow volumes caused by extreme rainfall in the central region of South Korea
2021, Engineering Geology
In South Korea, the risk of debris-flow is relatively high due to the country's vast mountainous topographical features and intense continuous rainfall during the summer. Debris-flows can result in the loss of human life and severe property damage, which can be made worse due to the poor spatiotemporal predictability of such hazards. Therefore, it is essential to research the preemptive prediction and mitigation of debris-flow hazards. For this purpose, this study developed an ANN model to predict the debris-flow volume based on 63 historical events. By considering the morphology, rainfall, and geology characteristics of the studied area in central South Korea, the data of 15 debris-flow predisposing factors were obtained. Among these data, four predisposing factors (watershed area, channel length, watershed relief, and rainfall data) were selected based on Pearson's correlation analysis to check for significant correlations with the debris-flow volume. To determine the best performing ANN model, a validation testing was carried out involving ten-fold cross-validation with MSE and R² using both training and validation datasets, which were randomly split into a 7:3 ratio. The model performance validation results showed that an ANN model with two hidden neurons (4×2×1 architecture) had the highest R² value (0.828) and the lowest MSE (0.022). In addition, in a comparative study with other existing regression models, the ANN model showed better results in terms of adjusted R² value (0.911) using all datasets. Furthermore, 94% of the observed debris-flow volumes from the ANN model were within 1:2 and 2:1 lines of the predicted volumes. The results of this study have shown the potentiality of the developed ANN model to be a useful resource for decision-making and designing barriers in areas prone to debris-flows in South Korea.

View all citing articles on Scopus

View full text

Predicting landslides for risk analysis — Spatial models tested by a cross-validation technique

Abstract

Introduction

Section snippets

Study area and modeling database

Favorability function model for relative-hazard mapping

Fuzzy-set model and likelihood ratio function

Generating hazard classes for a prediction map

Cross-validation applied to a time-divided sample

Cross-validation applied to a spatial partitioning

Cross-validation for combined time and spatial partitionings

Estimating occurrence probabilities of future landslides

Discussion and prospect

Acknowledgments

Computers & Geosciences

Geomorphology

Discriminant Analysis and Applications

Validation of spatial prediction models for landslide hazard mapping

Natural Hazards

Systematic procedures of landslide hazard mapping for risk assessment using spatial prediction models

A strategy for sustainable development of nonrenewable resources using spatial prediction models

Analysis of Binary Data

A predictive approach to the random effect model

Biometrika

Discrete Distributions

Multivariate Analysis