Geological Hazard Susceptibility Analysis Based on RF, SVM, and NB Models, Using the Puge Section of the Zemu River Valley as an Example

Li, Ming; Li, Linlong; Lai, Yangqi; He, Li; He, Zhengwei; Wang, Zhifei

doi:10.3390/su151411228

Open AccessArticle

Geological Hazard Susceptibility Analysis Based on RF, SVM, and NB Models, Using the Puge Section of the Zemu River Valley as an Example

¹

State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu 610059, China

²

College of Earthscience, Chengdu University of Technology, Chengdu 610059, China

³

College of Resources and Environment, Xichang University, Xichang 615000, China

⁴

College of Tourism and Urban-Rural Planning, Chengdu University of Technology, Chengdu 610059, China

^*

Author to whom correspondence should be addressed.

Sustainability 2023, 15(14), 11228; https://doi.org/10.3390/su151411228

Submission received: 15 June 2023 / Revised: 11 July 2023 / Accepted: 13 July 2023 / Published: 19 July 2023

(This article belongs to the Section Hazards and Sustainability)

Download

Browse Figures

Versions Notes

Abstract

:

The purpose of this study was to construct a geological hazard susceptibility evaluation and analysis model using three types of machine learning models, namely, random forest (RF), support vector machine (SVM), and naive Bayes (NB), and to evaluate the susceptibility to landslides, using the Puge section of the Zemu River valley in the Liangshan Yi Autonomous Prefecture as the study area. First, 89 shallow landslide and debris flow locations were recognized through field surveys and remote sensing interpretation. A total of eight hazard-causing factors, namely, slope, aspect, rock group, land cover, distance to road, distance to river, distance to fault, and normalized difference vegetation index (NDVI), were selected to evaluate the spatial relationship with landslide occurrence. As a result of the analysis, the results of the weighting of the hazard-causing factors indicate that the two elements of rock group and distance to river contribute most to the creation of geological hazards. After comparing all the indices of the three models, the random forest model had a higher correct area under the ROC curve (AUC) value of 0.87, root mean squared error (RMSE) of 0.118, and mean absolute error (MAE) of 0.045. The SVM model had the highest sensitivity to geological hazards. The results of geological hazard prediction susceptibility analysis matched the actual situation in the study area, and the prediction effects were good. The results of the hazard susceptibility assessment of the three models are able to provide support and help for the prevention and control of geological hazards in the same type of areas.

Keywords:

Zemu River valley; geological hazards; machine learning; susceptibility

1. Introduction

Shallow landslides and debris flows are the most important types of geological hazards in the Puge section of the Zemu River valley in the Liangshan Yi Autonomous Prefecture, Sichuan Province. According to a geological field survey, 89 geological hazards were found in the study area, including 72 shallow landslides and 17 debris flows, and the types of hazards are mainly shallow landslides, the overall scale is mainly small- and medium-sized, and the frequency and scope of geological hazards are far greater than for earthquakes. As one of the most high-risk areas for geological hazards in China, Puge County, Liangshan Prefecture, has extremely high vulnerability [1]. Research on geohazards in Puge County and the surrounding counties is divided into the aspects of genesis analysis [1], inception conditions [2], and susceptibility [3]. The study of geohazard susceptibility can identify hazardous areas, provide important theoretical support for hazard risk management, prevent and control hazards more effectively, reduce the threat to people’s lives and properties from the occurrence of geohazards, and provide a reference for the selection of engineering construction sites. Therefore, it is important to study the geological hazard susceptibility of a region [4].

The evaluation models of current research on geohazard susceptibility can be generally categorized into qualitative and quantitative analysis. With the development of information technology, quantitative analysis is gradually being applied more frequently than qualitative analysis in geological hazard evaluation [2,3,4,5,6,7]. Qualitative analysis methods are too subjective, and rely on experts’ a priori experience with a certain degree of subjective influence [8]. The quantitative analysis mainly includes statistical models (e.g., logistic regression models and cluster analysis models) [9,10,11,12], machine learning, and deep learning models. These models can objectively count the relationships between land hazards and their causes. Meanwhile, with the development of computer technology, machine learning has become a hot topic, and is being applied by more and more researchers in the field of geological hazards [13,14,15,16,17]. It has the advantage of handling very large data volumes as well as ultra-multidimensional spatial datasets, and enables accurate classification and prediction [18]. Machine learning algorithms, such as RF, SVM, NB, the artificial neural network (ANN), and the convolutional neural network (CNN), have been applied in many types of evaluations conducted on geological hazards [19], during which a large number of experts and scholars also compared the advantages and disadvantages of model performance by running results on a large number of single and coupled models [20,21,22,23].

In the study of the Zemu River rift zone, Wen [24] analyzed the segmental nature of seismic rupture and its causes in the Xianshui River–Anning River–Zemu River rift zone in Western Sichuan based on various hierarchical data, and Xie [4] analyzed the activity characteristics of the Zemu River rift zone and the influence of geological hazards on geomorphological evolution, taking the Goose Palm River basin as an example. Feng [25] systematically revealed the causal mechanism and hazard-causing mode of the geological hazards of large turnip faults in the Zemu River rift zone. Li [26] took the Zemu River fracture zone (Puge section) as the study area, analyzed the geohazard control effects and development patterns in the study area, and applied the weighted information quantity model coupled with the deterministic coefficient model and the information quantity model to evaluate geohazard susceptibility by ArcGIS.

So far, although various methods have been used by different researchers around the world for geohazard susceptibility assessment, there is no consensus on which model is most suitable for a specific region, so it is necessary to use various methods in the study area to determine the method with the highest predictive power. This paper differs from the above-mentioned studies in that three different machine learning models, RF, SVM, and NB, are built based on the Python language to overlay and analyze the susceptibility of the study area to geohazards, to quickly and accurately predict the susceptibility of this area to shallow landslides and debris flows, and to compare the characteristics and performance advantages and disadvantages of the above three models based on the model prediction results with the area under the ROC curve (AUC). The research results can provide some scientific basis for hazard prevention and mitigation in the same types of areas.

2. Overview of the Study Area

2.1. Geological Overview of the Study Area

The study area covers 100.03 km² and is located in Puge County, Liangshan Yi Autonomous Prefecture, Sichuan, with the geographical coordinates of 100°03′20″~101°40′00″ E and 27°40′30″~29°10′20″ N (Figure 1). It belongs to the Zemu River valley area of the Hengduan Mountains on the Yunnan–Guizhou Plateau, and is located in the transition zone from the first to the second terrace in China, which is a typical alpine valley landscape. Complex fracture structures in the area mainly consist of a series of near north–south trending pressure fractures and low-order pressure torsion or tension–torsion type fractures. The valley, with erosion and accumulation resulting from the Zemu River, is an important economic zone of activity in Puge County, and is controlled by geological structures and a plateau climate. The erosion and denudation topography of this area is highly developed, the topography in the territory is complex, the terrain changes frequently, the relative height difference is huge, and the stratigraphy is complex. The regional tectonics are mainly controlled by the Zemu River fault, which forms a special fault block unit resulting in geomorphic activity, and provides the basis for the development of regional geological hazards. In addition, human engineering and economic activities in the study area are strong, and the conflict between humans and land is very prominent, with the majority of arable land and residential bases being used for engineering and construction, which makes the natural balance of the slope geotechnical bodies in the area vulnerable to disruption and induces shallow landslides and debris flows.

2.2. Climate Profile of the Study Area

The average annual temperature in the study area is low in the north and high in the south. The average annual temperature in the Pingba Valley, based on many years of data, is 17–19 °C, the average temperature of the hottest month is 21–24 °C, mostly in July, and the average temperature of the coldest month is 8–11 °C. The extreme maximum temperature is 41 °C, and the extreme minimum temperature is −6.7 °C. The difference between winter and summer temperatures is not large, with a small annual temperature difference and a large daily temperature difference. The sudden hot and cold temperatures cause the slope rock weathering to intensify and produce natural slope cutting or natural stripping, eventually changing the shape and gradient of the slope, and this effect and impact on the road graben slope is worthy of attention.

Precipitation in the study area mainly comes in the form of rain and snow, with rainfall below 3000 m above sea level and snowfall above. Precipitation in the form of alternating rain and snow or mixed rain and snow tends to cause the contraction and expansion of the soil, resulting in stress changes in the soil layer, making the soil less stable and possibly triggering geological disasters such as slope landslides [27]. The average precipitation, based on many years of data, is 1176.3 mm, and the maximum annual precipitation is 1291.2 mm. Precipitation varies greatly from year to year, and is unevenly distributed within an individual year. From May to October, with the beginning of the monsoon season, the rainfall increases dramatically, and a more concentrated rainfall phenomenon occurs, with precipitation accounting for 89.2% of the annual precipitation. The impact of short-term heavy rainfall on slope stability is relatively large. Short-time heavy rainfall can easily trigger geological disasters such as floods and mudslides, which causes significant damage and impacts on slopes. In addition, short-term heavy rainfall also accelerates the infiltration of water into the soil and reduces the stability of slopes, thus leading to landslides and the crumbling of slopes over a relatively short period of time.

3. Materials and Methods

3.1. Dataset Preparation

Projection transformation, calibration, resampling, and other operations on the collected eight-factor data (Table 1) were performed, and the data were unified to 30 × 30 m resolution raster points. Then, the existing geological hazards in the study area were determined via remote sensing interpretation, and the results were verified based on the field geological hazard survey of the Zemu Valley, Puge County (Figure 2). Subsequently, the ranges of non-geological hazards in the study area were uniformly circled. Finally, the eight-factor data within the range of geohazards and non-geological hazards were extracted and randomly divided into training and validation sets at a ratio of 2/3:1/3.

3.2. Establishment of Susceptibility Evaluation Index

When evaluating the susceptibility of a region to geological hazards, the correct selection of hazard-causing factors is the first step. According to previous studies, we selected a total of eight factors: slope, slope direction, rock group, distance to roads, distance to rivers, distance to faults, land-use cover type, and normalized vegetation index [28,29,30,31,32,33].

3.2.1. Slope

The slope is an important factor controlling the occurrence of geological hazards (Figure 3a), which can not only quantitatively describe the topographic and geomorphological characteristics, but also has a good correlation with the development of landslide hazards [34]. Most of the geological hazards in Puge County are developed on slopes with gradients under 40°, and the number of geological hazards shows a trend of increasing and then decreasing with increasing slope gradient [3]. The largest number of geohazards in the study area developed on slopes between 10° and 30°, accounting for 86.52% of the total. The slope provides a proximal surface for the hazards, controls the friction of the landslide surface, and affects the probability of slope displacement. Slope data were extracted using 30 m resolution DEM data of the study area.

3.2.2. Aspect

Aspect can be used as an important indicator to describe the information of terrain characteristics, and is the basic value for geological analysis models, such as hydrological models, surface material movement, soil erosion, and land-use planning, and can indirectly represent the undulating morphology and structure of the terrain, which are important factors affecting the stability of slopes [35]. At the same time, there are differences in the duration of sunshine and the intensity of solar radiation for different aspects (Figure 3b). Generally speaking, sunny slopes receive longer sunshine hours and stronger solar radiation intensity, which will lead to better vegetation development on sunny slopes and accelerate the weathering of geotechnical bodies, thus controlling the stability of the slopes.

3.2.3. Rock Group

There is a strong relationship between the occurrence of geological hazards and rock group (Figure 3c). Under tension or stress, rock groups influence the development of joint fractures. In addition, the variability of the upper and lower rock layers can create unconformities that favor the occurrence of geological hazards. Each stratum has different resistance to geological hazards, and it is generally believed that the higher the resistance, the less likely the occurrence of geological hazards [36,37]. According to the degree of rock hardness, rock groups are divided into hard rock, harder rock, softer rock, soft rock, and loose rock.

3.2.4. Distance to Road

Through previous studies [38] and fieldwork, the topography of the study area is alpine canyon terrain with extensive forests and relatively inconvenient transportation. In recent years, as a result of vigorous economic development, people have accelerated the construction of roads and the construction of fire escapes in mountainous areas. The construction of these projects usually involves cutting the slope in order to build, destroying the stability of the original slope, and roads have become key factors affecting the development of the geological hazard of shallow landslides (Figure 3d).

3.2.5. Distance to River

Rivers not only have a shaping effect on river banks, but also their distribution and density characterize the infiltration capacity of regional soils and the water content level of soils (Figure 3e). Related studies have pointed out that there is a close correlation between groundwater content and the development of geological hazards [39]. Therefore, distance to a river becomes one of the key factors in analyzing the susceptibility, danger, and risk of geological hazards [28,29,30,31,32].

3.2.6. Distance to Fault

The Zemuhe fault zone is the main active fault in the study area, and the seismic activity in the area is strong. The earthquake source is generally developed on the fault, and the closer to the fault zone, the greater the number and density of earthquake-induced geological hazard developments [40]. At the same time, the joints and fissures near the fault zone are more developed, which then develop into landslide back wall fissures and increase the probability of landslide geological hazards (Figure 3f).

3.2.7. Land Cover

In the context of rapid urbanization, human activities have become an important factor influencing the distribution of geological hazards [41]. Land-use cover types are differentiated human modifications of geotechnical bodies, and these modifications destroy the natural environment’s ability to self-regulate to a certain extent; for example, the construction of panhandle roads affects the natural soil’s ability to absorb rainwater, which in turn leads to the accumulation of rainwater and the scouring of slopes, and this can easily cause soil erosion and even trigger geological hazards. Therefore, it is generally believed that the more fragile the land, the greater the possibility of geological hazards [28,42] (Figure 3g).

3.2.8. NDVI

NDVI reflects the development of surface vegetation, and more and more scholars regard NDVI as an indicator factor used to describe the characteristics of land cover types that affect the occurrence of landslide hazards [43] The higher the vegetation index, the better the development of surface vegetation, the greater the biomass, and the stronger the ability to prevent wind, consolidate soil, and weaken the scouring of rainwater on the surface. It is generally believed that the higher the vegetation index, the lower the probability of geological hazards [29,31] (Figure 3h).

3.3. Geological Hazard Susceptibility Model

3.3.1. RF Model

The RF model is the most commonly used of the integrated algorithms. It draws multiple samples from the original data via repeated autonomous sampling and constructs decision trees for each sample, it then collects these decision trees together and votes with each decision tree as a member in order to achieve classification and prediction. The RF algorithm has multiple advantages, the most important of which is that in the case of a very large amount of sample data, the more feature elements present, the fewer errors generated, and the less likely they are to be overfitted. Also, RF has been widely and effectively used for geohazard identification as well as geohazard susceptibility analysis to obtain the importance of variables [31,32,44]. The flow of the RF model is shown in Figure 4.

3.3.2. SVM Model

SVM is a classifier, proposed by CORINNA Cortes and VAPNIK, the basic principle of which is to find an optimal classification hyperplane in the sample space such that the distance to the classification hyperplane of the two classes of sample points is the largest (Figure 5). The larger the distance between the hyperplanes of the closest points, the better the generalization of the classifier and the lower the error. It is able to solve the problem of converting nonlinear into linear solutions by mapping low-dimensional nonlinear data into a high-dimensional space with fewer samples, finding the optimal hyperplane to separate the positive and negative classes of data, and keeping the interval to the maximum, which creates better robustness for the SVM [45,46,47]. Also, SVM has been widely used in the analysis of geohazard susceptibility, with good results [29,30].

3.3.3. NB Model

The NB algorithm is an important algorithm in the field of machine learning and data mining exploration. The fundamental idea of NB is that a number of items to be classified are known, the probability of each category given the occurrence of this item is found, and the one with the highest probability corresponds to that category. It is mainly used in the field of classification in machine learning, and has wide applications in text analysis, opinion analysis, medical diagnosis, user preference analysis, geographic information service matching, etc. [48]. In recent years, the naive Bayesian method has also been applied to the evaluation of land hazard susceptibility [33,49].

3.3.4. Model Construction

Model construction is a key step in geological hazard susceptibility analysis, and the specific model construction and workflow are shown in Figure 6. The code availability section is in Appendix A. First, geological and non-geological hazard areas were delineated based on remote sensing interpretation, and were transformed into point sets. Second, eight factors were selected based on geological and environmental conditions and human engineering activities as the main factors affecting the development of shallow landslides and debris flows in the study area. ArcGIS software was used to extract these attributes from the ground disaster area as positive samples, and from the non-ground disaster area as negative samples, in order to establish the dataset. Of these, the negative samples were selected evenly, corresponding to the positive sample area 50 m away from the negative area, to ensure that the number was similar to that of the positive samples. Then, the dataset was randomly divided into training and test sets. In total, 2/3 of the training samples were used to construct the RF model, the SVM model, and the NB model using the Python-based sklearn learning library, and the remaining 1/3 of the test samples were used for model testing. Finally, all the data of the study area were input into the model, and the prediction results of shallow landslide and debris flow susceptibility, the analysis results of hazard-causing factor weights, and the analysis results of model accuracy, were obtained.

3.3.5. Model Accuracy Analysis Methods

The ROC curve is one of the most commonly used tools to test the accuracy of geological hazard susceptibility evaluation [50]. As a binary classification model, the ROC curve is plotted by calculating the true positive rate and false positive rate of the sample as vertical and horizontal coordinates, respectively. The area under the ROC curve (AUC) can be used as a value that is able to visually present the accuracy of the evaluation results with good objectivity and validity, and the larger its value, the higher the prediction accuracy.

MAE is a commonly used regression model evaluation metric to measure the difference between the predicted and true values, as well as the square of the deviation between the predicted and true values of RMSE and the square root of the ratio of the number of observations. The lower the RMSE and MAE as model performance metrics, the better the performance of the model, and the easier it is to find other errors in the model [51,52,53].

In addition, four indices, namely, ACC, precision, recall, and F1, are used to evaluate the performance of the model, of which ACC represents the total correctness, precision and recall reflect the accuracy of the prediction results, and F1 demonstrates the performance of the model. Generally, it is considered that high accuracy, recall, and F1 is ideal. The results obtained from the validation set as a model input are compared with the real results to obtain the results of the computational accuracy evaluation index.

4. Results

4.1. Model Accuracy Analysis Results

According to the model input results (Figure 7), the AUC values under the curves of the RF, SVM, and NB models are 0.870, 0.835, and 0.817, respectively, with the area under the curve of the RF being the highest.

The results show (Table 2) that among the indices, the RF model has higher indices than the other two models, with the ACC at 98.4%, the precision at 98.68%, the recall at 98.13%, and the F1 at 98.4%, as well as an RMSE of 0.118 and an MAE of 0.045, thus demonstrating the accuracy and reliability of the RF model.

It can thus be judged that the RF model has the best prediction accuracy and generalization performance, and can objectively and accurately analyze the vulnerability of geological hazards in a study area; therefore, the next step of the hazard-causing factor weighting analysis works best with the RF model.

4.2. The Results of Hazard Factor Weight Analysis

The weights of each factor of the established RF model were output, and the results are shown in Figure 8. Eight causal factors contribute to the creation of geological hazards, among which two factors, namely, rock group and distance to water system, are the most important. The distance to the fault, slope direction, and slope are the next most important factors, while land-use type, vegetation index, and distance to the road have the least contribution to geological hazards.

In the process of remote sensing image interpretation and the field investigation of geological hazards in the Zemu River valley in Puge County, it was found that the topography of the study area is complex and mountainous, the overall degree of exploitability is low, and the settlement areas and highway construction sites in Puge County are located on both sides of rivers with relatively low elevation, relatively flat terrain, and a relatively high degree of exploitability, making them prone to forming unstable slopes. At the same time, the closer the slopes are to the rivers, the higher the water content of the soil, and the more likely it is to be destabilized under rainfall. In summary, the established model matches real situations, has high model reliability, and also shows that topography, water systems, and human activities are the main factors causing the formation of shallow landslides and debris flows in the area.

4.3. Geological Hazard Susceptibility Prediction Results

In order to further study the distribution pattern of regional geohazards, we extracted the data of eight hazard-causing factors within the study area, and used the established RF model to make predictions regarding the probability of the occurrence of geohazards in the study area. The output geological hazards all have a probability range between 0 and 1. Regarding the prediction results, they were classified into five classes, according to the natural interruption method [54], namely, very low susceptibility (0–0.1), low susceptibility (0.1–0.3), medium susceptibility (0.3–0.55), high susceptibility (0.55–0.8), and very high susceptibility (0.8–0.1). The prediction map of geological hazard susceptibility in the study area was thus obtained, as shown in Figure 8.

In Figure 9 and Table 3, the following results can be observed.

RF model: High- and very-high-susceptibility zones had the smallest areas, accounting for 6.22% and 5.39% of the study area, respectively, and a total of 48 hazard points fell into the high-susceptibility zone and above, with 16 hazard points falling within the medium-susceptibility area and 25 hazard points falling within the low-susceptibility area and below. The accuracy of the hazard points falling within the medium-susceptibility area and above was the highest of the three models.

SVM model: High- and very-high-susceptibility areas accounted for 9.44% and 14.05% of the study area, and had the advantage of being the most sensitive to geological hazards among the three models, with a total of 46 hazard points falling within high-susceptibility areas and above, 12 hazard points falling within the medium-susceptibility area, and 31 hazard points falling within the low-susceptibility area and below. The accuracy of hazard points falling within the medium-susceptibility area and above was the second highest of the three models, slightly lower than RF.

NB model: High- and very-high-susceptibility zones accounted for 10.68% and 10.55% of the study area, and were similar to the SVM model in terms of prediction results, but only 38 hazard points fell within the high-susceptibility zone and above, 15 hazard points fell within the medium-susceptibility area, and 36 hazard points fell within the low-susceptibility area and below. The accuracy of the hazard points that fell within the medium-susceptibility area and above was the highest among the three models, and was significantly lower than the other two models.

Meanwhile, the three models were combined to show that the very-high-susceptibility zone in the Puge section of the Zemu River valley is mainly located along the Zemu River and its tributaries on both sides of the region, while the high- and medium-susceptibility zones mainly surround the very-high-susceptibility zone, and are also found along the river. Low-susceptibility areas and very-low-susceptibility areas are mainly located in mountainous areas of higher elevation, and human settlements and agriculture, such as the town of Luoyang and buckwheat plantations, are located in areas with a higher risk of geological hazards, and are thus more likely to be affected.

5. Discussion

5.1. Study on the Prediction Model of Geological Hazard Susceptibility

Previous studies on geohazard susceptibility have tended to be works of a priori experience by experts; however, the use of algorithmic models can avoid the determination of artificial weights, and also improve efficiency. With the development of information technology, more and more machine learning models have been used in the analysis of geological hazard susceptibility. Tian [43] and Liu [36] used the CF-logistic regression model to analyze geological hazard susceptibility, and the AUC values of both studies were 0.782 and 0.814, respectively. This model is based on existing data in order to establish the regression formula for the classification boundary line, which is less complex but easy to underfit, and the classification accuracy is not high enough. At the same time, more researchers have invested in deep learning for the study of land hazard vulnerability, such as Xia [55], who used an ANN model with a prediction rate of 0.837. ANN, CNN, RNN, and other deep learning models have the advantage of strong learning ability and high accuracy, but the disadvantage is that the learning time is too long, a very large number of parameters are required, the learning process cannot be observed, and the classification results are difficult to interpret in the model. As the results are difficult to interpret, it is not feasible for a study area that lacks a large amount of parameter data as support.

This paper is based on three machine learning models using the same dataset for geohazard susceptibility analysis, which is able to fully compare the differences of multiple models for geohazard susceptibility analysis in the same area, among which SVM and RF have been widely used for landslide prediction [56,57], while NB has rarely been used for geohazard susceptibility analysis. Comparing the model accuracy results, the RF results are better than the above models, while the results of SVM and NB are close to those of the other models. Therefore, the RF algorithm has higher model accuracy when applied to geological hazard susceptibility and risk prediction.

From the prediction results, the area above the RF high-susceptibility zone accounts for 11.63% of the total study area, whereas 23.53% is accounted for with NB, and 21.27% with SVM. The SVM and NB are more similar in prediction results.

Overall, all three models performed well in geological hazard susceptibility analysis, and the RF model performed relatively well in the generation of the geological hazard susceptibility zoning maps, which were subsequently evaluated.

5.2. Differences in the Importance of Hazard-Causing Factors

It is worth noting that the RF cannot control the inner workings of the model, and can only be tried between different parameters and random seeds; thus, the choice of different parameters leads to different results [28]. In contrast, while the NB model assumes that the different factors are independent of each other, this assumption is often not valid, and there is some association between the causative factors [58]. The variability of geological and environmental conditions in different study areas, as well as differences in the intensity of human engineering activities, make it normal for the importance ranking of the hazard factors to vary across the conclusions of many studies. For example, when Yu [59] used a logistic regression model to evaluate the susceptibility of a region to shallow landslides, he found that slope was an intrinsic condition required for landslides to occur. In another study, Mohammed Amin Benbouras [60] used a neural network algorithm to evaluate landslide susceptibility, and the results showed that the distance to rivers was a key factor affecting landslide susceptibility, while lithology and distance to roads also affected landslide susceptibility. Chen [61] used an information quantity evaluation model (RS-IVM) method based on rough set theory, and concluded that the factors that have the greatest weight on the development of debris flow are slope, slope direction, and vegetation cover. Furthermore, Li [46] used two machine learning algorithms, namely, RF and SVM, for debris flow susceptibility analysis, and concluded that fracture zone density is also the main control factor for mudflow formation. These studies are consistent in that their results demonstrate that lithology, slope, and distance to water systems are the main factors that induce landslides and geological mudflow hazards, and these conclusions are consistent with the findings of this paper. Regarding other studies in similar regions, Wu [28] used the RF algorithm to analyze landslide susceptibility in Muli County; however, the difference of this study is that 10 factors, such as rainfall, were used, and more parameters and more single hazard types were evaluated. Li [26] used a weighted informativeness model coupled with the deterministic coefficient model in Puge County to obtain the results of geological hazard susceptibility zoning, which are close to those of this paper. Meanwhile, the parameters used in this paper take into account the influence of human construction and the reinforcing effect of vegetation on geotechnical bodies.

When researchers analyze the vulnerability of landslide hazards, they rarely consider the occurrence and interaction of multiple geological hazards in complex environments, such as mudflows, shallow landslides, and other hazards that have already occurred. The results of geologic hazard susceptibility of Ma [44] can be improved if other hazards are simulated and predicted simultaneously in regional studies. Some studies have differences in the factors of the debris flow susceptibility analysis and landslide susceptibility analysis when studying susceptibility to multiple geological hazards. For example, Li [62] used both TWI and SPI for mudflows when studying integrated geohazard susceptibility, and while the results of landslide susceptibility were better, the analysis results for mudflows were poor. This study concluded that, although landslides and mudflows are two different types of block movement, in many cases, landslides and mudflows occur together and transform rapidly, making it difficult to distinguish them, so that the same natural hazard phenomenon in the same area is considered a landslide by some researchers and a mudflow by others [63]. Meanwhile, the type of debris flow occurring in the study area is mostly the gully type of debris flow due to its high mountain valley geomorphological factors [25]. Shallow landslides occur on multiple slopes in gullies and accumulate in gully channels, and flash floods carry these accumulations downstream to form mudflows [17]. Therefore, this paper argues that shallow landslides and debris flow formation zones in such areas have similar hazard-causing factors, and combining the two common geological hazards is more conducive to grasping the common developmental patterns of these two hazards; however, the limitation of doing so is that only factors that influence the creation of both can be selected for susceptibility evaluation, and it is easier to ignore the difference in susceptibility between the two under the influence of a single factor, such as debris flows and landslides. There is a difference in water permeability in the formation process: the formation of a mudflow depends more on water, while the formation of a landslide depends more on the wetness of the ground surface, and requires less water. These issues can be further explored.

5.3. Shortcomings and Prospects

Shortcomings: The landslide data in this paper were firstly interpreted by remote sensing to identify the extent of existing geological hazards in the study area, and then the results were verified based on the field geological hazard survey in the Zemu River valley, Puge County. Due to a limited spatial resolution, small-scale geological hazards may not be accurately identified. This study only shows the geological hazard sites in 2022, and the statistical data of landslide information for this single year are not comprehensively considered. Therefore, obtaining more comprehensive landslide data is an important challenge for the preparation of landslide susceptibility, hazard, and risk assessment maps [64].

Outlook: This study selects suitable disaster-causing factors based on the typical alpine valley geomorphology of the Puge section of the Zemu River valley, and uses three machine learning methods as a means to conduct geological hazard susceptibility analysis. All three methods can provide technical support for geological hazard evaluation in similar areas, and the results can also act as a reference for disaster prevention and mitigation in the corresponding areas.

6. Conclusions

In this paper, three machine learning models, namely, RF, SVM, and NB, were used to establish an efficient, fast, and accurate method to evaluate the susceptibility of a region to geological hazards based on eight hazard-causing factors, such as geological environment conditions and human engineering activities, and to compare and analyze the accuracy and results. The results obtained by these methods are generally consistent, except that they have advantages and disadvantages in terms of accuracy and geohazard sensitivity. The risks and potential impacts of geological hazards in the study area were successfully quantified. The RF model comparison results show that the RF model has the best prediction accuracy and generalization performance, and can most objectively and accurately analyze the vulnerability of geological hazards in the study area, that the NB model has the lowest prediction accuracy, and that the SVM model has intermediate prediction accuracy.
The results of the geological hazard susceptibility analysis show that the very-high-susceptibility areas in the Puge section of the Zemu River valley are mainly distributed along the Zemu River and tributaries in both sides of the region, and the high- and medium-susceptibility areas are mainly distributed around the very-high-susceptibility areas, and are thus also located along the river. Low-susceptibility areas and very-low-susceptibility areas are mainly located in mountainous areas of higher elevation, and human settlements and agriculture, such as the towns of Luojishan and Qiaowo, are located in areas with a higher risk of geological disasters, which are more likely to be affected by geological hazards.
The results of the model analysis and evaluation show that the two factors of rock group and distance to a water system have the greatest influence on the development of geological hazards. The three factors of distance to fault, slope direction, and slope are the second most important, and land-use type, vegetation index, and distance to road have the smallest roles in creating geological hazards. The structure of soft rock formations is loose, and the material can easily accumulate water, while gullies and rivers are scattered throughout the mountains, cutting through and penetrating them, thus forming numerous slopes and cutting surfaces with enough sliding space to be easily circumvented, creating the basic conditions for shallow landslides and debris flows to occur, leading to geological hazards. It has been suggested that during the process of engineering construction and urban planning, loose rock formations should be reinforced or the building of important buildings should be avoided, and river regulation, embankment construction, and reservoir construction should be strengthened in order to minimize the occurrence of shallow landslide and debris flow hazards.

Author Contributions

M.L.: methodology and writing—original draft preparation. L.L.: software. Y.L.: construction. L.H. and Z.H.: writing—review and editing. Z.W.: resources. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Independent Research Project of the State Key Laboratory of Geohazard Prevention and Geoenvironment Protection Independent Research Project (SKLGP2021Z003) and supported by Natural Science Foundation of Sichuan Province (2022NSFSC1040).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Authors confirm that all relevant data of the present research are included in the article.

Acknowledgments

We thank the State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, and Xichang University for their support. This study is part of the research activities carried out by the first author during his Ph.D. The authors express sincere thanks to the anonymous reviewers for critical evaluation and constructive suggestions to improve the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

import numpy as np

import scipy as sp

import pandas as pd

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, precision_score, recall_score, f1_score

from sklearn.model_selection import train_test_split

from sklearn.ensemble import AdaBoostClassifier

from sklearn.tree import DecisionTreeClassifier

from sklearn.linear_model import LogisticRegression

from sklearn import metrics

import os

#from sklearn.externals import joblib

import joblib

import matplotlib as mpl

import matplotlib.pyplot as plt

import seaborn as sns

import pydotplus

from sklearn.naive_bayes import GaussianNB

from sklearn import svm

if __name__ == ‘__main__’:

pd.set_option(‘display.width’, 400)

pd.set_option(‘display.expand_frame_repr’, False)

pd.set_option(‘display.max_columns’, 70)

column_names = ‘rock’, ‘road’, ‘river’, ‘fault’, \

‘aspect’, ‘slope’, ‘land cover’, ‘NDVI’, ‘disaster’

column_names_xy = ‘x’, ‘y’

path = ‘D:\\’

model_file = path + ‘geological.pkl’

model_filebn = path + ‘geologicalbn.pkl’

model_filesvc = path + ‘geologicalsvc.pkl’

show_result = True

print(‘read’)

data = pd.read_csv(path + ‘train.data’, header = None, names = column_names)

datat = pd.read_csv(path + ‘test.data’, header = None, names = column_names)

dataw = pd.read_csv(path + ‘yucw.data’, header = None, names = column_names)

dataxy = pd.read_csv(path + ‘xy.data’, header = None, names = column_names_xy)

num_data = len(data)

num_datat = len(datat)

num_dataw = len(dataw)

print(num_data, num_datat, num_dataw)

dfallbn = pd.concat([dataw, datat, data], axis = 0)

for name in dfallbn.columns:

dfallbn[name] = pd.Categorical(dfallbn[name]).codes

print(dfallbn, len(dfallbn))

dataw = dfallbn.iloc[0:num_dataw,]

datat = dfallbn.iloc[num_dataw: num_datat + num_dataw,]

data = dfallbn.iloc[num_datat + num_dataw: num_dataw + num_datat + num_data,]

print(dataw)

print(datat)

print(data)

x = data[data.columns[:−1]]

y = data[data.columns[−1]]

xt = datat[data.columns[:−1]]

yt = datat[data.columns[−1]]

xw = dataw[data.columns[:−1]]

yw = dataw[data.columns[−1]]

x_train, x_valid, y_train, y_valid = train_test_split(x, y, test_size = 0.5, random_state = 5)

if os.path.exists(model_filebn):

model = joblib.load(model_file)

bayes_model = joblib.load(model_filebn)

svc_model = joblib.load(model_filesvc)

else:

print(‘model’)

model = RandomForestClassifier(n_estimators = 380, max_features = 5, criterion = ‘gini’, max_depth = 11, min_samples_split = 5)

model.fit(x_train, y_train)

print(‘rf over’)

bayes_model = GaussianNB()

bayes_model.fit(x_train,y_train)

bayes_result = bayes_model.predict_proba(x_train)

print(“bayes_result”, bayes_result)

print(‘bn over’)

svc_model = svm.SVC(probability = True)

svc_model.fit(x_train,y_train)

svc_result = svc_model.predict(x_train)

print(svc_result)

print(‘svc over’)

joblib.dump(svc_model, model_filesvc)

if show_result:

y_train_predsvc = svc_model.predict(x_valid)

print(‘\t ACC svc:’, accuracy_score(y_train, y_train_predsvc))

print(‘\t Precision svc:’, precision_score(y_train, y_train_predsvc))

print(‘\t recall svc:’, recall_score(y_train, y_train_predsvc))

print(‘\t F1svc:’, f1_score(y_train, y_train_predsvc))

joblib.dump(bayes_model, model_filebn)

if show_result:

y_train_predbn = bayes_model.predict(x_valid)

print(‘\t ACC svcbn:’, accuracy_score(y_train, y_train_predbn))

print(‘\t Precision bn:’, precision_score(y_train, y_train_predbn))

print(‘\t recall bn:’, recall_score(y_train, y_train_predbn))

print(‘\t F1 bn:’, f1_score(y_train, y_train_predbn))

joblib.dump(model, model_file)

if show_result:

y_train_pred = model.predict(x_train)

print(‘\t ACC sl:’, accuracy_score(y_train, y_train_pred))

print(‘\t Precision sl:’, precision_score(y_train, y_train_pred))

print(‘\t recall sl:’, recall_score(y_train, y_train_pred))

print(‘\t F1 sl:’, f1_score(y_train, y_train_pred))

ytp = model.predict(xt)

ytpbn = bayes_model.predict(xt)

ytpsvc = svc_model.predict(xt)

print(‘y_test_proba_sl = ’, model.predict_proba(xt))

print(‘y_test_proba_svm = ’, svc_model.predict_proba(xt))

print(‘y_test_proba_bn = ’, bayes_model.predict_proba(xt))

print(‘y_w_proba_sl = ’, model.predict_proba(xw))

print(‘y_w_proba_svm = ’, svc_model.predict_proba(xw))

print(‘y_w_proba_bn = ’, bayes_model.predict_proba(xw))

y_proba_sl = model.predict_proba(x_valid)

y_proba_sl = y_proba_sl[:, 1]

y_proba_svm = svc_model.predict_proba(x_valid)

y_proba_svm = y_proba_svm[:, 1]

y_proba_bn = bayes_model.predict_proba(x_valid)

y_proba_bn = y_proba_bn[:, 1]

y_proba_slt = model.predict_proba(xt)

y_proba_slt = y_proba_slt[:, 1]

y_proba_svmt = svc_model.predict_proba(xt)

y_proba_svmt = y_proba_svmt[:, 1]

y_proba_bnt = bayes_model.predict_proba(xt)

y_proba_bnt = y_proba_bnt[:, 1]

y_pre_sl = model.predict(x_valid)

y_pre_svm = svc_model.predict(x_valid)

y_pre_bn = bayes_model.predict(x_valid)

fpr_sl, tpr_sl, thresholds_sl = metrics.roc_curve(y_valid, y_proba_sl)

fpr_svm, tpr_svm, thresholds_svm = metrics.roc_curve(y_valid, y_proba_svm)

fpr_bn, tpr_bn, thresholds_bn = metrics.roc_curve(y_valid, y_proba_bn)

auc_sl = metrics.auc(fpr_sl, tpr_sl)

auc_svm = metrics.auc(fpr_svm, tpr_svm)

auc_bn = metrics.auc(fpr_bn, tpr_bn)

fpr_slt, tpr_slt, thresholds_slt = metrics.roc_curve(yt, y_proba_slt)

fpr_svmt, tpr_svmt, thresholds_svmt = metrics.roc_curve(yt, y_proba_svmt)

fpr_bnt, tpr_bnt, thresholds_bnt = metrics.roc_curve(yt, y_proba_bnt)

auc_sl_t = metrics.auc(fpr_slt, tpr_slt)

auc_svm_t = metrics.auc(fpr_svmt, tpr_svmt)

auc_bn_t = metrics.auc(fpr_bnt, tpr_bnt)

print(‘AUC_sl = ’, auc_sl)

print(‘AUC_svm = ’, auc_svm)

print(‘AUC_bn = ’, auc_bn)

print(‘AUC_slt = ’, auc_sl_t)

print(‘AUC_svmt = ’, auc_svm_t)

print(‘AUC_bnt = ’, auc_bn_t)

from sklearn.metrics import mean_absolute_error

from sklearn.metrics import mean_squared_error,r2_score

def evaluation(y_test, y_predict):

mse = mean_squared_error(y_test, y_predict)

rmse = np.sqrt(mean_squared_error(y_test, y_predict))

return rmse, mse

print(evaluation(y_valid,y_proba_sl))

print(evaluation(y_valid,y_proba_svm))

print(evaluation(y_valid,y_proba_bn))mpl.rcParams[‘font.sans-serif’] = ‘SimHei’

mpl.rcParams[‘axes.unicode_minus’] = False

plt.figure(facecolor = ‘w’)

plt.plot(fpr_slt, tpr_slt, ‘r-’, lw = 2, alpha = 0.8, label = ‘RF AUC = %.3f’ % auc_sl_t)

plt.plot(fpr_bnt, tpr_bnt, ‘b-’, lw = 2, alpha = 0.8, label = ‘Gaussian Bayes AUC = %.3f’ % auc_bn_t)

plt.plot(fpr_svmt, tpr_svmt, ‘g-’, lw = 2, alpha = 0.8, label = ‘SVM AUC = %.3f’ % auc_svm_t)

plt.plot((0, 1), (0, 1), c = ‘b’, lw = 1.5, ls = ‘--’, alpha = 0.7)

plt.xlim((0, 1))

plt.ylim((0, 1))

plt.xticks(np.arange(0, 1.1, 0.2))

plt.yticks(np.arange(0, 1.1, 0.2))

plt.xlabel(‘False positive rate’, fontsize = 14)

plt.ylabel(‘True positive rate’, fontsize = 14)

plt.grid(visible = True)

plt.legend(loc = ‘lower right’, fancybox = True, framealpha = 0.8, fontsize = 14)

plt.title(‘ ’, fontsize = 17)

plt.show()

print(‘y_c_proba_sl = ’, model.predict_proba(xw))

print(‘y_c_proba_svm = ’, svc_model.predict_proba(xw))

print(‘y_c_proba_bn = ’, bayes_model.predict_proba(xw))

df_data = pd.DataFrame(dataw)

dfbn = pd.DataFrame(bayes_model.predict_proba(xw))

dfsl = pd.DataFrame(model.predict_proba(xw))

dfsvc = pd.DataFrame(svc_model.predict_proba(xw))

print(dfbn)

print(dataxy)

dfallsl = pd.concat([dfsl, dataxy], axis = 1)

dfallsvc = pd.concat([dfsvc, dataxy], axis = 1)

dfallbn = pd.concat([dfbn, dataxy], axis = 1)

dfallsl.to_excel(path + ‘resultsl.xlsx’,sheet_name = “sl”,index = False,na_rep = 0,inf_rep = 0)

dfallsvc.to_excel(path + ‘resultsvc.xlsx’,sheet_name = “svc”,index = False,na_rep = 0,inf_rep = 0)

dfallbn.to_excel(path + ‘resultbn.xlsx’,sheet_name = “bn”,index = False,na_rep = 0,inf_rep = 0)

print(‘over’)

References

Guo, N. Analysis of Cause and Stability on the Yaojiashan Landslide in the Puge County. Sci. Technol. Eng. 2014, 14, 114–118. [Google Scholar] [CrossRef]
Wang, W.P.; Han, A.G.; Ren, G.M.; Yang, L.; Huang, W.F. Sensitivity Analysis of Hazard-brewing Environmental Factors of Landslides in Puge County of Sichuan Province. J. Yangtze River Sci. Res. Inst. 2018, 35, 63–67, 97. [Google Scholar] [CrossRef]
Xiong, X.H.; Wang, C.L.; Bai, Y.J.; Tie, Y.B.; Gao, Y.C.; Li, G.H. Comparative analysis of landslide susceptibility evaluation in counties based on different coupling models: A case study of Puge County, Sichuan Province. Chin. J. Geol. Hazard Control 2022, 33, 114–124. [Google Scholar] [CrossRef]
Xie, J.Z.; Feng, W.K.; Yang, S.S.; Li, C.S.; Hu, Y.P.; Wang, Q. Active Characteristics and Geohazard of Zemuhe Fault and Their Influence on Morphological Evolution in Ezhang River. J. Eng. Geol. 2017, 25, 772–783. [Google Scholar] [CrossRef]
Neaupane, K.M.; Piantanakulchai, M. Analytic network process model for landslide hazard zonation. Eng. Geol. 2006, 85, 281–294. [Google Scholar] [CrossRef]
Sun, D.; Xu, J.; Wen, H.; Wang, D. Assessment of landslide susceptibility mapping based on Bayesian hyperparameter optimization: A comparison between logistic regression and random forest. Eng. Geol. 2021, 281, 105972. [Google Scholar] [CrossRef]
Phuong, T.; Mahdi, P.; Khabat, K.; Omid, G.; Narges, K.; Artemi, C.; Saro, L. Evaluation of deep learning algorithms for national scale landslide susceptibility mapping of Iran. Geosci. Front. 2021, 12, 505–519. [Google Scholar] [CrossRef]
Mandal, B.; Mandal, S. Analytical hierarchy process (AHP) based landslide susceptibility mapping of Lish river basin of eastern Darjeeling Himalaya, India. Adv. Space Res. 2018, 62, 3114–3132. [Google Scholar] [CrossRef]
Guzzetti, F.; Carrara, A.; Cardinali, M.; Reichenbach, P.; Giardino, J.R.; Marston, D.; Morisawa, M. Landslide hazard evaluation: A review of current techniques and their application in a multi-scale study, central Italy. Geomorphology 1999, 31, 181–216. [Google Scholar] [CrossRef]
Hu, K.H.; Cui, P.; Han, Y.S.; You, Y. Evaluation of debris flow and landslide susceptibility in Wenchuan disaster area based on clustering and maximum likelihood method. Sci. Soil Water Conserv. 2012, 10, 12–18. [Google Scholar] [CrossRef]
Wang, J.; Guo, J.; Wang, W.D.; Fang, L.G. Application and comparison of weighted linear combination model and logistic regression model in landslide susceptibility mapping. J. Cent. South Univ. 2012, 43, 1932–1939. [Google Scholar]
Huang, F.M.; Yin, K.L.; Jiang, S.H.; Huang, J.S.; Cao, Z.S. Landslide susceptibility evaluation based on cluster analysis and support vector machine. Chin. J. Rock Mech. Eng. 2018, 37, 156–167. [Google Scholar] [CrossRef]
Ali, S.; Hoseyn, S.; Cristina, C.; Mohammad, H.; Marco, P.; David, M.; Nader, K.; Mohammad, H.; Larry, K.B. Using machine learning in photovoltaics to create smarter and cleaner energy generation systems: A comprehensive review. J. Clean. Prod. 2022, 364, 132701. [Google Scholar] [CrossRef]
Anne, E.; Geert, A.; Anne, E.; Abigail, C.; Joost, W.; Charles, M.; Job, N.; Andrew, D.; Carel, J.C.G.; Alasdair, G.; et al. A Machine Learning Algorithm to Estimate the Probability of a True Scaphoid Fracture After Wrist Trauma. J. Hand Surg. 2022, 47, 709–718. [Google Scholar] [CrossRef]
Fahri, A.; Şerif, B. Data poisoning attacks against machine learning algorithms. Expert Syst. Appl. 2022, 208, 118101. [Google Scholar] [CrossRef]
Zhu, M.Y.; Wang, J.W.; Yang, X.; Zhang, Y.; Zhang, L.Y.; Ren, H.Q.; Wu, B.; Ye, L. A review of the application of machine learning in water quality evaluation. Eco-Environ. Health 2022, 1, 107–116. [Google Scholar] [CrossRef]
Chen, W.H.; Yu, B.; Liu, K.; Ye, L.Z.; Ma, Y. Fast recognition method for debris flows caused by shallow landslides. Yangtze River 2023, 54, 152–158. [Google Scholar] [CrossRef]
Wu, W.C.; Claudio, Z.; Fadi, K.; Liu, G.P. Enhancing the performance of regional land cover mapping. Int. J. Appl. Earth Obs. Geoinf. 2016, 52, 422–432. [Google Scholar] [CrossRef]
Azarafza, M.; Azarafza, M.; Akgün, H.; Atkinson, P.M.; Derakhshani, R. Deep learning-based landslide susceptibility mapping. Sci. Rep. 2021, 11, 24112. [Google Scholar] [CrossRef] [PubMed]
Trigila, A.; Iadanza, C.; Esposito, C.; Scarascia-Mugnozza, G. Comparison of Logistic Regression and Random Forests techniques for shallow landslide susceptibility assessment in Giampilieri (NE Sicily, Italy). Geomorphology 2015, 249, 119–136. [Google Scholar] [CrossRef]
Chen, W.; Sun, Z.; Han, J. Landslide Susceptibility Modeling Using Integrated Ensemble Weights of Evidence with Logistic Regression and Random Forest Models. Appl. Sci. 2019, 9, 171. [Google Scholar] [CrossRef] [Green Version]
Li, Z.T.; Wang, T.; Zhou, Y.; Liu, J.M.; Xin, P. Landslide susceptibility assessment based on information content, logistic regression and coupling model: A case study of the Shatangchuan Watershed in Qinghai Province. Geoscience 2019, 33, 235–245. [Google Scholar] [CrossRef]
Zhang, Z.Y.; Deng, M.G.; Xu, S.G.; Zhang, Y.B.; Fu, H.L.; Li, Z.H. Comparison of landslide susceptibility assessment models in Zhenkang County, Yunnan Province, China. Chin. J. Rock Mech. Eng. 2022, 41, 157–171. [Google Scholar] [CrossRef]
Wen, X.Z. Character of Rupture Segmentation of the Xianshuihe-Anninghe-Zemuhe Fault Zone, Western Sichuan. J. Seismol. 2000, 3, 239–249. [Google Scholar] [CrossRef]
Feng, W.K.; Yang, Q.; Yang, X.; Xie, J.Z.; Li, C.S.; Zhou, Q. Study of Disaster Effect and Disaster Mitigation Model of Zemuhe Fault Zone. J. Eng. Geol. 2018, 26, 939–950. [Google Scholar] [CrossRef]
Li, G.H.; Tie, Y.B.; Bai, Y.J.; Xiong, X.H. Distribution and susceptibility assessment of geological hazards in Zemuhe fault zone (Puge section). Chin. J. Geol. Hazard Control 2022, 33, 123–133. [Google Scholar] [CrossRef]
Satyanaga, A.; Rahardjo, H. Role of unsaturated soil properties in the development of slope susceptibility map. Proc. Inst. Civ. Eng.-Geotech. Eng. 2022, 175, 276–288. [Google Scholar] [CrossRef]
Wu, X.Y.; Song, Y.B.; Chen, W.; Kang, G.C.; Qu, R.; Wang, Z.F.; Wang, J.X.; Lv, P.Y.; Chen, H. Analysis of Geological Hazard Susceptibility of Landslides in Muli County Based on Random Forest Algorithm. Sustainability 2023, 15, 4328. [Google Scholar] [CrossRef]
Wei, W.H.; Jia, Y.F.; Sheng, Y.F.; Xu, G.L.; Yang, Y.J.; Zhang, R.D. Research on Landslide Susceptibility Evaluation Model Based on I, SVM and I-SVM. Saf. Environ. Eng. 2023, 30, 136–144. [Google Scholar] [CrossRef]
Jia, Y.F.; Wei, W.H.; Chen, W.; Yang, Q.Z.; Sheng, Y.F.; Xu, G.L. Landslide susceptibility assessment based on the SOM-I-SVM model. Hydrogeol. Eng. Geol. 2023, 50, 125–137. [Google Scholar] [CrossRef]
Wang, X.D.; Zhang, C.B.; Wang, C.; Zhu, Y.D.; Wang, H.P. Geological Disaster Susceptibility in Helong City Based on Logistic Regression and Random Forest. J. Jilin Univ. Earth Sci. Ed. 2022, 52, 1957–1970. [Google Scholar] [CrossRef]
Ma, X.; Wang, N.Q.; Li, X.K.; Yan, D.; Li, J.L. Assessment of Landslide Susceptibility Based on RF-FR Model:Taking Lueyang County as an Example. Northwest. Geol. 2022, 55, 335–344. [Google Scholar] [CrossRef]
Zhang, X.L.; Wang, M.; Cao, Y.X.; Liu, K.; Hong, C.Y. Comparison of three typical machine learning methods in susceptibility assess-ment of disasters. J. Saf. Sci. Technol. 2018, 14, 79–85. [Google Scholar] [CrossRef]
Guo, F.F.; Yang, N.; Meng, H.; Zhang, Y.Q.; Ye, B.Y. Application of the relief amplitude and slope analysis to regional landslide hazard assessments. Geol. China 2008, 324, 131–143. [Google Scholar] [CrossRef]
Liu, X.J.; Gong, J.Y.; Zhou, Q.M.; Tang, G.A. Analysis and Research on the Accuracy of Slope and Aspect Algorithm Based on DEM. Acta Geod. Cartogr. Sin. 2004, 3, 258–263. [Google Scholar] [CrossRef]
Liu, L.Y.; Gao, H.Y.; Li, Z. Landslide susceptibility assessment in Yongjia County based on the coupling of CF and Logistic regression model. J. Ocean Univ. China Nat. Sci. Ed. 2021, 51, 121–129. [Google Scholar] [CrossRef]
Zhang, Y.H.; Nie, L.; Wang, S.; Wang, B.; Pang, Z.J.; Xiong, S.H. Study on disaster characteristics of reservoir bank reconstruction in Jinning section of Yipan Expressway. J. Yangtze River Sci. Res. Inst. 2020, 37, 67–73+91. [Google Scholar] [CrossRef]
Mu, C.L.; Pei, X.J.; Wang, R.; Wang, C. Analysis of deformation and failure characteristics of high slope with multi-layer weak interlayer excavation based on physical model test. Chin. J. Geol. Hazard Control 2022, 33, 61–67. [Google Scholar] [CrossRef]
Huang, R.Q.; Xu, Z.M.; Xu, M. The disaster effect of groundwater and geological hazard induced by abnormal groundwater flow. Earth Environ. 2005, 3, 1–9. [Google Scholar] [CrossRef]
Feng, W.; Bi, Y.Q.; Tang, Y.M.; Zhang, L.Z.; Li, Z.G. Study on the distribution law and fault effect of geological disasters along the Lixian-Luojiapu fault zone in Gansu Province. J. Nat. Disasters 2021, 30, 183–190. [Google Scholar] [CrossRef]
Zhang, Z.J.; Huang, X.; Cai, Y.W.; Fu, J.Y.; Zhu, Y.; Yang, R.; Han, C.Q. The evolution pattern of landslide disaster driving factors and the influence of human activities in Wulong section of Three Gorges Reservoir area. Chin. J. Geol. Hazard Control 2022, 33, 39–50. [Google Scholar] [CrossRef]
Dieu, T.B.; Biswajeet, P.; Owe, L.; Inge, R. Landslide Susceptibility Assessment in Vietnam Using Support Vector Machines, Decision Tree, and Naïve Bayes Models. Math. Probl. Eng. 2012, 2012, 974638. [Google Scholar] [CrossRef] [Green Version]
Tian, C.S.; Liu, X.L.; Wang, L. Evaluation of geological disaster susceptibility in Guangdong Province based on CF and Logistic regression model. Hydrogeol. Eng. Geol. 2016, 43, 154–161+170. [Google Scholar] [CrossRef]
Ma, Y.B.; Li, H.L.; Wang, L.; Zhang, W.G.; Zhu, Z.W.; Yang, H.Q.; Wang, L.Q.; Yuan, X.Z. Machine learning algorithms and techniques for landslide susceptibility investigation:A literature review. J. Civ. Environ. Eng. 2022, 44, 53–67. [Google Scholar] [CrossRef]
Wang, N.Q.; Guo, Y.J.; Liu, T.M.; Zhu, Q.H. Assessment of L andslide Susceptibility Based on SVM-LR Model:A Case Study of Lintong District. Sci. Technol. Eng. 2019, 19, 62–69. [Google Scholar]
Li, K.; Zhao, J.L.; Lin, Y.L.; Chen, K.; Bi, R. Evaluation of debris flow susceptibility in Dongchuan based on RF and SVM models. J. Yunnan Univ. Nat. Sci. Ed. 2022, 44, 107–115. [Google Scholar]
Cortes, C.; Vapnik, V. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Ge, W.; Cheng, Y.; Sun, Y.F. Research of geographical information service-Naive Bayes classification and classification matching. Eng. Surv. Mapp. 2013, 22, 5–9. [Google Scholar] [CrossRef]
Pham, B.T.; Bui, D.T.; Dholakia, M.B.; Prakash, I.; Pham, H.V.; Mehmood, K.; Le, H.Q. A novel ensemble classifier of rotation forest and Naïve Bayer for landslide susceptibility assessment at the Luc Yen district, Yen Bai Province (Viet Nam) using GIS. Geomat. Nat. Hazards Risk 2017, 8, 649–671. [Google Scholar] [CrossRef] [Green Version]
Fawcett, T. An introduction to ROC analysis. Pattern Recognit. Lett. 2006, 27, 861–874. [Google Scholar] [CrossRef]
Deb, P.; Pal, S.K. Interaction behavior and load sharing pattern of piled raft using nonlinear regression and LM algorithm-based artificial neural network. Front. Struct. Civ. Eng. 2021, 15, 1181–1198. [Google Scholar] [CrossRef]
Mohamed, A.S.; Holger, R.M.; Mark, B.J. Predicting Settlement of Shallow Foundations using Neural Networks. J. Geotech. Geoenviron. Eng. 2002, 128, 785–793. [Google Scholar] [CrossRef]
Sabrina, C.Y.I.; Alfrendo, S.; Harianto, R. Spatial variation of shear strength properties incorporating auxiliary variables. Catena 2021, 200, 105196. [Google Scholar] [CrossRef]
Panchal, S.; Shrivastava, A.K. Landslide hazard assessment using analytic hierarchy process (AHP):A case study of National Highway 5 in India. Ain Shams Eng. J. 2022, 13, 101626. [Google Scholar] [CrossRef]
Xia, H.; Yin, K.L.; Liang, X.; Ma, F. Landslide Susceptibility Assessment Based on SVM-ANN Model—A Case Study of Wushan County in Three Gorges Reservoir Area. Chin. J. Geol. Hazard Control 2018, 29, 13–19. [Google Scholar] [CrossRef]
Huang, F.; Yao, C.; Liu, W.; Li, Y.; Liu, X. Landslide susceptibility assessment in the Nantian area of China: A comparison of frequency ratio model and support vector machine. Geomat. Nat. Hazards Risk 2018, 9, 919–938. [Google Scholar] [CrossRef] [Green Version]
Huang, S.X.; Dou, H.Q.; Jian, W.B.; Guo, C.X.; Sun, Y.X. Spatial prediction of the geological hazard vulnerability of mountain road network using machine learning algorithms. Geomat. Nat. Hazards Risk 2023, 14, 2170832. [Google Scholar] [CrossRef]
Ma, G. Improvement and Application of Naive Bayes Algorithm. Master’s Thesis, Anhui University, Hefei, China, 2018. [Google Scholar]
Yu, Z.W.; Liu, K.; Yin, J.; Yu, B. A mesh-scale division method suitable for logistic regression model to evaluate the susceptibility of shallow landslides: A case study of the group shallow landslides in Sanming City, Fujian Province in 2019. Mt. Res. 2022, 40, 106–119. [Google Scholar] [CrossRef]
Benbouras, M.A. Hybrid meta-heuristic machine learning methods applied to landslide susceptibility mapping in the Sahel-Algiers. Int. J. Sediment Res. 2022, 37, 601–618. [Google Scholar] [CrossRef]
Chen, J.J.; Qin, S.W.; Li, G.J.; Peng, S.Y.; Ma, Q.; Cao, C.; Liu, X.; Zhai, J.J. Evaluation of the vulnerability of debris flow disaster in Jilin Province based on RS-IVM. J. Basic Sci. Eng. 2021, 29, 1359–1371. [Google Scholar] [CrossRef]
Li, G.H.; Tie, Y.H. Comparative study on modeling methods of comprehensive geological hazard susceptibility based on information model. J. Catastrophol. 2023, 1–15. Available online: http://kns.cnki.net/kcms/detail/61.1097.P.20230225.2256.003.html (accessed on 12 May 2023).
Li, S.D. Formation Mechanism of the Lanaslide type Debris Flow. Acta Sci. Nat. Univ. Pekin. 1998, 4, 107–110. [Google Scholar] [CrossRef]
Nikoobakht, S.; Azarafza, M.; Akgün, H.; Derakhshani, R. Landslide susceptibility assessment by using convolutional neural network. Appl. Sci. 2022, 12, 5992. [Google Scholar] [CrossRef]

Figure 1. Study area location and geological hazard distribution map: (a) Liangshan Yi Autonomous Prefecture, Sichuan Province; (b) study area in Puge County; (c) distribution map of geological hazards in the study area.

Figure 2. Demonstration of hazard delineation work: (a) landslide near the debris flows in Caiazhe; (b) landslide near the debris flows in Wudaojing town; (c) debris flows in Zejialuobo village; (d) landslide in Luochangpin village; (e) mudslide in Qiaowo town; (f) landslide in front of Zemu River Bridge (the images were taken by UVA; the remote sensing image was provided by the Natural Resources Bureau of Puge County, GF-1).

Figure 3. Factors causing landslides and debris flows used in the study: (a) slope, (b) aspect, (c) rock group, (d) distance to road, (e) distance to river, (f) distance to fault, (g) land cover, (h) NDVI.

Figure 4. Random forest flow chart.

Figure 5. SVM principle diagram (revised according to Cortes et al. [47]). Decision boundary (red line); interval boundary (dashed line); support vector (red graphic).

Figure 6. Model work flow chart (the images were obtained from the Natural Resources Bureau of Puge County).

Figure 7. Receiver-operating characteristic curve. random guess line (blue dotted line).

Figure 8. The weight of each factor obtained under the RF model.

Figure 9. Hazard risk forecast map of economic activity in Zemu Valley: (a) RF, (b) SVM, (c) NB.

Table 1. Data types and sources of factors causing disaster.

Factors	Types	Sources
Slope	Raster data (30 m)	GDEMV3 30M (http://www.gscloud.cn/) (accessed on 5 December 2022)
Aspect	Raster data (30 m)
Rock group	Shapefile	Field data on geological hazards
Distance to road	Shapefile	Extracted using ArcGIS 10.8 software
Distance to river	Shapefile	Extracted using ArcGIS 10.8 software
Distance to fault	Shapefile	Extracted using ArcGIS 10.8 software
Land cover	Raster data (30 m)	www.globallandcover.com
NDVI	Raster data (30 m)	Landsat 8 OLI_TIRS Satellite data (http://www.gscloud.cn/)
Remote sensing	Raster data (2 m)	Natural Resources Bureau of Puge County (GF-1)

Table 2. Model accuracy comparison table.

	ACC	Precision	Recall	F1	RMSE	MAE
RF	0.984	0.987	0.981	0.984	0.118	0.045
SVM	0.888	0.93	0.84	0.882	0.277	0.154
NB	0.878	0.879	0.875	0.877	0.290	0.181

Table 3. Comparison table of model hazard susceptibility analysis results.

	Very Low	Low	Moderate	High	Very High
RF	58.39 km²	19.41 km²	10.40 km²	6.22 km²	5.39 km²
SVM	55.51 km²	12.23 km²	8.78 km²	9.44 km²	14.05 km²
NB	46.53 km²	18.99 km²	13.28 km²	10.68 km²	10.55 km²

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, M.; Li, L.; Lai, Y.; He, L.; He, Z.; Wang, Z. Geological Hazard Susceptibility Analysis Based on RF, SVM, and NB Models, Using the Puge Section of the Zemu River Valley as an Example. Sustainability 2023, 15, 11228. https://doi.org/10.3390/su151411228

AMA Style

Li M, Li L, Lai Y, He L, He Z, Wang Z. Geological Hazard Susceptibility Analysis Based on RF, SVM, and NB Models, Using the Puge Section of the Zemu River Valley as an Example. Sustainability. 2023; 15(14):11228. https://doi.org/10.3390/su151411228

Chicago/Turabian Style

Li, Ming, Linlong Li, Yangqi Lai, Li He, Zhengwei He, and Zhifei Wang. 2023. "Geological Hazard Susceptibility Analysis Based on RF, SVM, and NB Models, Using the Puge Section of the Zemu River Valley as an Example" Sustainability 15, no. 14: 11228. https://doi.org/10.3390/su151411228

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Geological Hazard Susceptibility Analysis Based on RF, SVM, and NB Models, Using the Puge Section of the Zemu River Valley as an Example

Abstract

1. Introduction

2. Overview of the Study Area

2.1. Geological Overview of the Study Area

2.2. Climate Profile of the Study Area

3. Materials and Methods

3.1. Dataset Preparation

3.2. Establishment of Susceptibility Evaluation Index

3.2.1. Slope

3.2.2. Aspect

3.2.3. Rock Group

3.2.4. Distance to Road

3.2.5. Distance to River

3.2.6. Distance to Fault

3.2.7. Land Cover

3.2.8. NDVI

3.3. Geological Hazard Susceptibility Model

3.3.1. RF Model

3.3.2. SVM Model

3.3.3. NB Model

3.3.4. Model Construction

3.3.5. Model Accuracy Analysis Methods

4. Results

4.1. Model Accuracy Analysis Results

4.2. The Results of Hazard Factor Weight Analysis

4.3. Geological Hazard Susceptibility Prediction Results

5. Discussion

5.1. Study on the Prediction Model of Geological Hazard Susceptibility

5.2. Differences in the Importance of Hazard-Causing Factors

5.3. Shortcomings and Prospects

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI