A comparative assessment of decision trees algorithms for flash flood susceptibility modeling at Haraz watershed, northern Iran
Graphical abstract
Introduction
All over the world, floods affect >20,000 lives per year (Sarhadi et al., 2012). In Asia, about 90% of all human losses are due to natural hazards mostly caused by floods (Dutta and Herath, 2004; Smith, 2013). Flooding occurs when a river's discharge exceeds its channel's capacity causing the river to overflow its floodplain. The most common cause of flooding is prolonged heavy rainfall (Casale and Margottini, 1999). A flash flood is a rapid flooding of geomorphic low-lying areas caused by extremely heavy rainfall in short time and also due to sudden dam or levee breaks, rock slide and/or mudslides (debris flow) (Elkhrachy, 2015). Iran has recently experienced many devastating flash floods in northern parts of the country at Noshahr (2012), Neka (2013), Behshahr (2013), and Sari City (2015) (Khosravi et al., 2016).
Main aim of the present flood modeling is to develop flood susceptibility maps in a frequently flood affected watershed. Due to complex, non-linear and dynamic structure of watersheds, floods cannot be modeled using simple non-linear hydrological models (Sahoo et al., 2009). Therefore, the problem of flood forecasting and mapping of some physically-based rainfall-runoff models still exist (Sahoo et al., 2009). One of the key solutions in future flash flood management and mitigation is the detection of flood-prone areas using appropriate methods with high precision (Youssef et al., 2011a).
There are many statistical and machine leaning methods available for flood susceptibility modeling. Statistical models for the flood prediction which include frequency ratio (Lee et al., 2012; Youssef et al., 2016), weights-of-evidence (Tehrany et al., 2014; Youssef et al., 2015b), and multiple criteria decision methods (Papaioannou et al., 2015; Stefanidis and Stathis, 2013; Youssef et al., 2011b). In recent years, machine leaning methods such as artificial neural networks (Radmehr and Araghinejad, 2014), logistic regression (Youssef et al., 2015a), support vector machines (Tehrany et al., 2014) and decision trees (Tehrany et al., 2013) were investigated for flood modeling with promising results. Among these methods, Decision Trees (DT) is a good method for flood susceptibility mapping and it has shown high prediction performance (Tehrany et al., 2013) However, the use of DT models for flash flood assessment is still limited.
The DT provides a transparent tree-like structure with easily interpreted rules (Tien Bui et al., 2016a). Other advantages of the DT method are: (1) it is a type of statistical analysis with no statistical distribution assumption, (2) it can handle data from various scales, (3) it permits identification of homogeneous groups with various susceptibility levels, and (4) it facilitates the construction of rules for prediction of complex relationships (Tehrany et al., 2013). The DT can also be used for the real time flood forecasting with respect to water level rise and water flow (Han et al., 2002).
Logistic Model Trees (LMT), Reduced Error Pruning Trees (REPT), Naïve Bayes Trees (NBT) and Alternating Decision Trees (ADT) are advanced DT methods. Therefore, the main objective of this study is to apply these models (LMT, REPT, NBT, and ADT) in the study area and compare results for the selection of best flash flood susceptibility assessment model. The Haraz Watershed (Mazandarn Province) which is a flash flood prone area of northern Iran was selected as the study area. Statistical evaluation measures, the Receiver Operating Characteristic (ROC) curve, and Freidman and Wilcoxon signed-rank tests were used to validate and compare the predictive capability of the models. Data processing and modeling were done using Arc map 10.2 and Weka 3.7.12 software.
Section snippets
Logistic Model Trees (LMT)
The LMT is a classification method, which combines decision trees (C4.5 algorithm) and logistic regression machine learning methods. These methods are based on an earlier idea of a model that composes of a tree structure with a set of inner nodes and leaves or terminal nodes (Quinlan, 1993). The C4.5 algorithm is used at the nodes and logistic regression function which is used at the leaves (Quinlan, 1993). Linear logistic regression is used to find the posterior probability in a leaf node with
Study area
The Haraz watershed located in the south of the Amol City, Mazandaran Province Iran was selected as the study area as it is prone to destructive floods (Fig. 1). The area is hilly and mountainous with intervening valleys ranging between altitude 328 m and 5600 m (a.s.l) covering about 4014 km2. The climate of Haraz is a combination of the moderate cold climate of mountainous regions and the mild humid climate of the Caspian shoreline area. The average annual rainfall is around 430 mm.
The study area
Flash flood inventory
There are several types of floods such as flash floods, coastal floods, urban floods, fluvial floods and pluvial floods. In the study area, a number of houses and buildings were reported collapsed, water raised over human height, road and other infrastructure damaged besides loss of life and property during flash floods. Therefore, in the present study, flash floods were selected for flood susceptibility modeling. Flood inventory map for the study area was constructed using historical data of
Application of decision trees algorithms (LMT, REPT, NBT, and ADT) in the flash flood susceptibility assessment of the study area
For doing this task, we used statistical assumptions namely (i) floods are controlled by mechanical laws which can be determined statistically and empirically and (ii) future flood events will occur under the same conditions that produced them in the past (Tien Bui et al., 2016d). Main steps of flash flood susceptibility assessment include: (i) data collection and analysis, (ii) selection and analysis of flood-influencing factors, and (iii) flood susceptibility modeling and validation, and
Selection of flood-influencing factors
It is desirable that factors which have no contribution on modeling results, and may even reduce the prediction capability of the models (Tien Bui et al., 2016c) should be removed to improve the performance of models (Pham et al., 2016a). In the present study, information gain ratio and multicollinearity diagnostics methods namely variance inflation factors and tolerances were selected for testing the factors for modeling.
Conclusions
In this study, four state-of-the art machine learning decision trees models, LMT, REPT, NBT and ADT, were applied and evaluated first time for flood susceptibility mapping at the Haraz Watershed, Northern Iran. Eleven flood-influencing factors namely ground slope, altitude, curvature, SPI, TWI, land use, rainfall, river density, distance from river, lithology, and NDVI were initially selected for model study. The predictive capability of these influencing factors was tested using the IGR
Acknowledgement
Authors are thankful to the Head of the Institute of Sari Agricultural Science and Natural Resources University, Iran; University of Kurdistan, Iran; University of Transport Technology, Ha Noi, Viet Nam; Norwegian University of Life Sciences, Norway; Bhaskarcharya Institute for Space Applications and Geo-Informatics (BISAG), India; and University College of Southeast Norway for sharing the information and providing facilities to carry out this research work.
Conflict of interest
The authors declare that there is no conflict of interest.
References (83)
- et al.
Spatial probabilistic multi-criteria decision making for assessment of flood management alternatives
J. Hydrol.
(2016) - et al.
Comparison of aligned Friedman rank and parametric methods for testing interactions in split-plot designs
Comput. Stat. Data Anal.
(2003) - et al.
A novel hybrid artificial intelligence approach for flood susceptibility assessment
Environ. Model Softw.
(2017) - et al.
A comparative study of population-based optimization algorithms for downstream river flow forecasting by a hybrid neural network model
Eng. Appl. Artif. Intell.
(2015) Flash flood hazard mapping using satellite images and GIS tools: a case study of Najran City, Kingdom of Saudi Arabia (KSA)
Egypt. J. Remote Sens. Space. Sci.
(2015)- et al.
Urban flood hazard zoning in Tucumán Province, Argentina, using GIS and multicriteria decision analysis
Eng. Geol.
(2010) - et al.
Modeling of groundwater level fluctuations using dendrochronology in alluvial aquifers
J. Hydrol.
(2015) - et al.
Roles of saltcedar (Tamarix spp.) and capillary rise in salinizing a non-flooding terrace on a flow-regulated desert river
J. Arid Environ.
(2012) - et al.
Spatial prediction of landslide hazard at the Yihuang area (China) using two-class kernel logistic regression, alternating decision tree and support vector machines
Catena
(2015) Estimating classification error rate: repeated cross-validation, repeated hold-out and bootstrap
Comput. Stat. Data Anal.
(2009)
A comparative study of different machine learning methods for landslide susceptibility assessment: a case study of Uttarakhand area (India)
Environ. Model Softw.
Hybrid integration of Multilayer Perceptron Neural Networks and machine learning ensembles for landslide susceptibility assessment at Himalayan area (India) using GIS
CATENA
Simplifying decision trees
Int. J. Man Mach. Stud.
Forecasting stream water temperature using regression analysis, artificial neural network, and chaotic non-linear dynamic models
J. Hydrol.
Probabilistic flood inundation mapping of ungauged rivers: linking GIS techniques and frequency analysis
J. Hydrol.
Sparse alternating decision tree
Pattern Recogn. Lett.
Neural network river forecasting through baseflow separation and binary-coded swarm optimization
J. Hydrol.
Spatial prediction of flood susceptible areas using rule based decision tree (DT) and a novel ensemble bivariate and multivariate statistical models in GIS
J. Hydrol.
Flood susceptibility mapping using a novel ensemble weights-of-evidence and support vector machine models in GIS
J. Hydrol.
Flood susceptibility assessment using GIS-based support vector machine model with different kernel types
Catena
Deterministic and probabilistic flood modeling for contemporary and future coastal and inland precipitation inundation
Appl. Geogr.
Hybrid artificial intelligence approach based on neural fuzzy inference model and metaheuristic optimization for flood susceptibility modelling in a high-frequency tropical cyclone area using GIS
J. Hydrol.
Comparison of a logistic regression and Naïve Bayes classifier in landslide susceptibility assessments: the influence of models complexity and training dataset size
Catena
A comparative assessment of support vector regression, artificial neural networks, and random forests for predicting and mapping soil organic carbon stocks across an Afromontane landscape
Ecol. Indic.
Prediction of rainfall time series using modular artificial neural networks coupled with data-preprocessing techniques
J. Hydrol.
A guide to using the collinearity diagnostics
Comput. Econ.
Classification and Regression Trees
Landslide susceptibility analysis in the Hoa Binh province of Vietnam using statistical index and logistic regression
Nat. Hazards
Floods and Landslides: Integrated Risk Assessment: Integrated Risk Assessment; with 30 Tables
A hybrid model coupled with singular spectrum analysis for daily rainfall prediction
J. Hydroinf.
GIS-based landslide susceptibility modelling: a comparative assessment of kernel logistic regression, Naïve-Bayes tree, and alternating decision tree models
Geomat. Nat. Haz. Risk
Collinearity: a review of methods to deal with it and a simulation study evaluating their performance
Ecography
Trend of floods in Asia and flood risk management with integrated river basin approach
Significance Probabilities of the Wilcoxon Test
Ann. Math. Stat.
The alternating decision tree learning algorithm
The use of ranks to avoid the assumption of normality implicit in the analysis of variance
J. Am. Stat. Assoc.
Improved decision tree induction algorithm with feature selection, cross validation, model complexity and reduced error pruning
Int. J. Comput. Sci. Info. Technol.
Multivariate Data Analysis
River flow modelling using fuzzy decision trees
Water Resour. Manag.
GIS-based landslide spatial modeling in Ganzhou City, China
Arab. J. Geosci.
A GIS-based flood susceptibility assessment and its mapping in Iran: a comparison between frequency ratio and weights-of-evidence bivariate statistical models with multi-criteria decision-making technique
Nat. Hazards
Cited by (512)
Dominant flood types in mountains catchments: Identification and change analysis for the landscape planning
2024, Journal of Environmental ManagementLocal-scale flash flood susceptibility assessment in northeastern Bangladesh using machine learning algorithms
2024, Environmental Challenges