Next Article in Journal
Effects of Fine Particulate Matter (PM2.5) on Systemic Oxidative Stress and Cardiac Function in ApoE−/− Mice
Previous Article in Journal
Analysis of Sampling Methodologies for Noise Pollution Assessment and the Impact on the Population
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China

1
Faculty of Information Engineering, China University of Geosciences, Wuhan 430074, China
2
Institute of Geophysics and Geomatics, China University of Geosciences, Wuhan 430074, China
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Int. J. Environ. Res. Public Health 2016, 13(5), 487; https://doi.org/10.3390/ijerph13050487
Submission received: 23 February 2016 / Revised: 26 April 2016 / Accepted: 4 May 2016 / Published: 11 May 2016

Abstract

:
In this study, a novel coupling model for landslide susceptibility mapping is presented. In practice, environmental factors may have different impacts at a local scale in study areas. To provide better predictions, a geographically weighted regression (GWR) technique is firstly used in our method to segment study areas into a series of prediction regions with appropriate sizes. Meanwhile, a support vector machine (SVM) classifier is exploited in each prediction region for landslide susceptibility mapping. To further improve the prediction performance, the particle swarm optimization (PSO) algorithm is used in the prediction regions to obtain optimal parameters for the SVM classifier. To evaluate the prediction performance of our model, several SVM-based prediction models are utilized for comparison on a study area of the Wanzhou district in the Three Gorges Reservoir. Experimental results, based on three objective quantitative measures and visual qualitative evaluation, indicate that our model can achieve better prediction accuracies and is more effective for landslide susceptibility mapping. For instance, our model can achieve an overall prediction accuracy of 91.10%, which is 7.8%–19.1% higher than the traditional SVM-based models. In addition, the obtained landslide susceptibility map by our model can demonstrate an intensive correlation between the classified very high-susceptibility zone and the previously investigated landslides.

1. Introduction

It is known that the area in the Three Gorges Reservoir along the Yangtze River is characterized by many active and reactivated landslides caused by the periodic water level fluctuation of the reservoir [1], which poses a serious threat to the security of life and property. Up to 2009, more than 3800 landslides have been recorded in this region [2]. Thus, it is crucial to predict slope failures in the Three Gorges area.
Landslide susceptibility evaluation is a complex task [3]. Compared to the traditional geological survey methods, such as landslide field reconnaissance, landslide spatial prediction is more convenient and efficient, due to the integration of geographical information systems (GIS) technology and statistical analysis principles. The spatial prediction of landslide susceptibility mapping is considered as one of the most important steps for landslide hazard mitigation and management [4], which has encouraged research towards knowledge-driven and data-driven models [5]. Knowledge-driven models, such as analytic hierarchy process (AHP) and fuzzy mathematics [5,6], are based on the analysis of landslide formation mechanism(s), and expert experience and knowledge are used to choose the most important environmental factors of landslides and quantitative weight values. On the other hand, data-driven models include logistic regression (LR) [7,8,9], artificial neural network (ANN) [10,11,12,13], SVM [14,15,16,17] and geographically weighted regression (GWR) [18,19], etc. These models are based on overlay analysis to calculate quantitative relationship between various environmental factors and the known distributions of landslides. Therefore, they are always used to determine weights of predictors, i.e., values/indices of landslide susceptibility.
Since support vector machine (SVM) can demonstrate satisfactory classification accuracies when a limited number of training samples is available, and it has been widely utilized to perform landslide susceptibility mapping [14,15,16,17,20,21]. However, the proper selection of a kernel function and its corresponding parameters is still an open problem, which can greatly influence the final prediction accuracy. To obtain the optimal parameters for SVM, some researchers worked on combining the particle swarm optimization (PSO) algorithm with the classical SVM model [22,23,24]. PSO is a population-based stochastic optimization technique developed by Eberhart and Kennedy [25], inspired by social behavior of bird flocking or fish schooling. This technique has many similarities with evolutionary computation techniques such as Genetic Algorithms (GA) [26]. For instance, the system is initialized with a population of random solutions and searches for optima by updating generations. However, unlike GA, PSO has no evolution operators such as crossover and mutation. In PSO, the potential solutions, called particles, fly through the problem space by following the current optimum particles. Compared to GA, the advantages of PSO are that it is easy to implement and there are few parameters to adjust [27]. To better perform landslide prediction, this technique can estimate optimum parameters for the SVM prediction model. For instance, Huang and Dun [22] proposed a PSO–SVM model to improve classification accuracies with an appropriate feature subset. One year later, Zhao and Yin [23] integrated the SVM, PSO and numerical analysis techniques to intelligent displacement back analysis in geomechanical parameter identification. More recently, Ren et al. [24] presented a landslide prediction method for the Shuping landslide by using a PSO-SVM model and wavelet analysis. However, the drawbacks of these techniques are threefold: first, the PSO algorithm always falls into a local optimum, especially in a very large area. Second, spatial autocorrelation in study areas is not taken into account. Finally, these methods applied a global model in a certain area and considered that the impacts of environmental factors are equal for the entire region, so they cannot describe the local characteristics of spatial landslide occurrences.
This paper presents an effective PSO-SVM model based on GWR for landslide susceptibility mapping. It should be noted that in practice different degrees of impact may occur at a local scale for study areas [18]. Moreover, the impacts of environmental factors always vary with spatial locations. It is well-known that most variables in real-world applications tend to be moderately spatially autocorrelated because of the way phenomena are geographically organized [28,29]. Therefore, spatial autocorrelation is always used to measure the degree to which a set of spatial features and their associated data values tend to be clustered together in space or dispersed [30,31]. Recently, many contributions have been devoted to using GWR to account for spatial autocorrelation and these have validated that GWR can be an effective estimator of spatial autocorrelation [32,33,34,35]. Inspired by previous works, we utilize the GWR technique to segment the study area into several prediction regions with a proper size. To this end, each computing unit in the study area is assigned a GWR coefficient by exploiting an appropriate kernel type and selection criteria. Meanwhile, each environmental factor is divided into several classes by the natural breaks method. By superposing these classification maps, different degrees of impacts at a local scale for these environmental factors are taken into account as well. As a consequence, the GWR coefficients in each prediction region are similar, while they make a great difference in different regions, i.e., spatial autocorrelations of environmental factors between them are greatly suppressed. Secondly, the PSO-SVM model is used in each prediction region for landslide susceptibility mapping. The PSO algorithm is utilized for the SVM model to search for optimal parameters in each prediction region. In this way, the problem of local optimum can be effectively overcome. In addition, the SVM model can be locally applied to each prediction region for accurate landslide susceptibility maps.
The remainder of this paper is organized as follows: Section 2 reviews the related techniques on GWR, PSO and SVM. Section 3 presents the proposed GWR-PSO-SVM model. Section 4 describes the study area and data used in this work. Section 5 reports experiments including comparative results between the traditional SVM-based prediction models and ours. Section 6 presents some discussions of our model and the last section states our concluding remarks.

2. Related Techniques

2.1. Geographically Weighted Regression

Geographically Weighted Regression (GWR) is a fairly recent contribution to modelling spatially heterogeneous processes [28,29,36,37] that has attracted much attention for its elegant performance when exploring local variations in a study area [18,38,39]. GWR is implemented by obtaining regression equations for each spatial zone separately [40] and its basic model can be written as:
y i = β 0 ( u i , v i ) + k = 1 Q β k ( u i , v i ) x i k + ε i
where ( u i , v i ) denotes the coordinates of the ith sample in space (e.g., latitude and longitude), i = 1,2,⋯,L, L and Q are the number of samples and regression coefficients, respectively. yi is the dependent variable at location i, xik is the value of the kth explanatory variable at location i, β k ( u i , v i ) is the local regression coefficients for the kth explanatory variable at location i, and β 0 ( u i , v i ) is the intercept parameter at location i. Then, the least square estimate of βi can be defined as follows:
β ^ i = ( X T W i X ) 1 X T W i Y
and its variance is:
var ( β ^ ) = ( X T W i 1 X ) 1
where Wi is n × n diagonal matrix, whose diagonal elements are the geographical weights:
W i = [ W i 1 0 0 0 W i 2 0 0 0 W i n ]
the choice of Wi depends on the selected kernel function, which can be fixed (i.e., fixed bandwidth) or adaptive kernels (i.e., varying bandwidths) in [41].
In practical, it is found that GWR is not sensitive to the choice of Gaussian function and bi-square function, but rather the bandwidth of the specific weight function. Based on the maximum likelihood principle, Akaike [42] proposed a general model selection criterion, called the Akaike Information Criterion (AIC), which is shown as follows:
A I C = 2 ln   L ( θ ^ L , x ) + 2 q
where L ( θ ^ L , x ) is the maximized likelihood of the parameter vector θ, x is a random sample, θ ^ L is the maximum likelihood estimate of θ, q is the number of the unknown parameters. The larger the likelihood function, the better the estimator. In this work, a minimum AIC model is selected as the “optimal” model.

2.2. Support Vector Machine

The support vector machine (SVM) incorporates mainly two learning techniques [43], i.e., Vapnik–Chervonenkis (VC) dimensional and statistical learning theories. One of the most important applications of SVM is classification. Because of its satisfactory performance and capabilities of fault-tolerance, SVM has recently attracted increasing attention and is widely used in machine learning, data mining and knowledge discovery [44,45], as well as landslide susceptibility assessment [14,15,16,17]. The SVM method is briefly introduced as follows [46,47]: assuming that a set of linear separable training vectors xi(i = 1,2,⋯,R, R is the total number of vectors) consists of two classes yi = ±1, which denote as landslide occurrence or not. The aim of SVM is to find an n-dimensional hyperplane to split two classes by the maximum gap, as shown in Figure 1. The n-dimensional hyperplane can be minimized as:
{ min 1 2 w 2 , s . t . ,   y i ( w x i + b ) 1
where ‖w‖ is the two-norm of w, b is used to increase the interval to ensure that the hyperplane does not cross the origin, xi is the point of the hyperplane, and w is a vector perpendicular to the hyperplane. By embedding a non-negative Lagrange multiplier (λi), the cost function can be obtained as follows:
L ( w , b , λ ) = 1 2 w 2 i = 1 n λ i ( y i ( w x i + b ) 1 )
The solution can be obtained by dual minimizing Equation (7) with respect to w and b. In the non-separable case, one can complete the constraints by introducing a non-negative ξi, then Equation (7) can be produced as follows:
{ min 1 2 w 2 + C i = 1 n ξ i , s . t . ,   y i ( ( w x i ) + b ) 1 ξ i
where ξi(ξi ≥ 0) is the slack variable, C is a penalty variable of the error term, which denotes the distance from a wrong point to its correct position.
In addition, the Gaussian Radial Basis Function (RBF) is used as a kernel function introduced by Vapnik [43] to account for the nonlinear decision boundary:
K ( x i , x j ) = exp ( γ x i x j 2 )
where γ is a positive variable to measure the width of the Gaussian kernel in RBF. This function is robust and can account for the nonlinear decision boundary.

2.3. Particle Swarm Optimization

The PSO algorithm is an evolutionary computation technique [25], which is derived from the complex adaptive system (CAS). The algorithm was originally inspired by the regularity of the activity of birds, and then a simplified model was established based on swarm intelligence. In PSO, the solution of each optimization problem is a bird in the search space, called a “particle”. PSO is initialized to a group of random particles and used to search the optimal solution by iterative evolution. In each iteration, the particles update themselves by tracking extremes of velocity and position. The above-mentioned behavior of the ith particles can be mathematically expressed as follows [48]:
{ V i n + 1 = t V i n + c 1 r 1 ( p i n x i n ) + c 2 r 2 ( p g n x i n ) x i n + 1 = x i n + V i n
where i = 1, 2, ⋯, K, K is the total number of particles, n is the current number of iteration. t is the inertia weight, p i n and p g n are the individual optimal position of the ith particle and the optimal position of all particles at the iteration of n, respectively. V i n and x i n are the velocity and the current position of the ith particle, respectively. V i n + 1 and x i n + 1 are the updated velocity and position of the ith particle at the iteration of n + 1, respectively. c1 and c2 are learning factors, r1 and r2 are two random numbers, ranging from 0 to 1. The process of the PSO algorithm is displayed in Figure 2.

2.4. The PSO-SVM Model

In order to improve the performance of the SVM model, the key issue is the selection of the parameters. Although the introduction of a kernel function can achieve the same purpose, the problem of selecting parameters of a kernel function still exists [22]. Combination of the PSO algorithm and SVM model can effectively solve this problem. Taking the RBF function as the kernel function, we demonstrate the flowchart of the PSO-SVM algorithm in Figure 3. To make this algorithm clearer, the details of this algorithm is briefly introduced in Table 1 as follows [22,49]:

3. The Proposed GWR-PSO-SVM Model

In this work, we present a coupled model by combining the techniques of GWR, PSO and SVM. The flowchart of our method is summarized in Figure 4. In the following, each step of our method is briefly introduced.

3.1. Factor Screening

It is well-known that some environmental factors have very high correlations. If our coupling model is constructed by using these factors, it may cause errors and cannot effectively improve prediction accuracies. Therefore, it is necessary to screen environmental factors. Correlation analysis is one of commonly used methods for the selection of environmental factors and is considered in our method. In addition, the required environmental factors are further screened based on their importance values. Finally, the remaining environmental factors are used for the subsequent landslide prediction.

3.2. Study Area Segmentation

It is well-known that GWR allows different relationships to exist at different points in the study area and improves the modeling performance by reducing spatial autocorrelations [50]. Based on Tobler’s theory about nearness and similarity, observations which are nearer a certain location should have a greater weight in the estimation than observations which are further away [51]. Therefore, we can utilize this technique to estimate parameters for a model at some locations. To segment the study area, we produce and map GWR coefficient values to explore the spatial variability of relationships between the study area and the environmental factors.
The natural breaks method is a typical classification method, which is based on the inherent nature of the packet data [52]. Meanwhile, GWR coefficient values can be used to characterize the spatial autocorrelation of factors. Therefore, we prefer to cluster the study area into several classes in which the GWR coefficient values are greatly similar, with respect to each environmental factor. Meanwhile, it should be noted that the total class number makes great impact on the resultant segmentation maps. Specifically, if the value of N is very large, there are too many small partitions in the segmentation map, which causes the difficulties of constructing samples for training and verification and obtaining satisfactory prediction accuracies, as discussed in Section 6.3. In addition, spatial dependency cannot effectively reduced since the region centers are very close. Otherwise, if the value of N is too small, there are very few large partitions in the segmentation map, which means that spatial autocorrelations cannot be effectively alleviated in each region and greatly influence prediction results. Furthermore, our method cannot achieve regional scale landslide prediction due to very few prediction regions in the entire study area. To make it clearer, the influence of prediction regions is detailed discussed in Section 6.2.
To further weaken spatial autocorrelations, we prefer to superpose classification maps of the selected environmental factors, as shown in Figure 5. Meanwhile, the required environmental factors can be chosen according to importance values of all the environmental factors, measured by the SVM model. It can be observed that the superposition process is a simple intersection of all classes obtained from the most important environmental factors. In addition, the process always results in over-segmentation of the study area, though the GWR coefficient values in each region are consistent for individual environmental factor. As a result, spatial autocorrelations cannot be thoroughly removed since the Euclidean distance between a pair of prediction region centers is too close. In addition, it is very difficult to select training and verification samples for landslide prediction due to quite small regions in the study area. Therefore, it is necessary to merge these small regions in the superposed map. For this aim, the distribution of landslides in the study area should be considered, i.e., (i) prediction regions which separate landslides should be merged as one prediction region; (ii) adjacent small regions including landslides, which are far from other landslides area, should be merged into one prediction region; (iii) a large region without landslide should not be merged with regions containing landslide, as shown in Figure 6.

3.3. The GWR-PSO-SVM Model

Once the study area is divided into several prediction regions by clustering GWR coefficients, the SVM model with the kernel function of RBF is used as the prediction component of the coupling model. Moreover, to improve the performance of prediction, the PSO algorithm is embedded into the SVM model to obtain the optimal parameters C and γ for each prediction region. The details of the GWR-PSO-SVM model for landslide prediction are shown in Table 2 as follows:

4. Study Area and Data

4.1. General Characteristics

The Three Gorges span from the western Sichuan Basin upstream to the eastern Jianghan Basin downstream [53]. Wanzhou is a district of Chongqing Municipality, bordering Sichuan Province to the northwest and Hubei Province to the southeast. It is one of the main ports of the Yangtze River basin and the important industrial, cultural, trade and transportation center in Yudong. The site covers an area of 3457 km2 and lies between longitudes of 107°52’22”–108°53’25” and latitudes of 30°24’25”–31°14’58”, belonging to the subtropical moist climate zone, with a mild climate and abundant rainfall. The annual average precipitation is 1191.3 mm and around 70% of the annual precipitation falls from May to September. Our study area is located in the center of Wanzhou district, distributed along the 80 km-long Yangtze River, with an area of 552 km2 and its elevation is between 21 m and 1015 m, as shown in Figure 7.

4.2. Geological Setting

The Wanzhou district is located at the two wings of the Wanxian synclinorium of the Eastern Sichuan fold belt. Meanwhile, anticline and syncline exist alternately in this area and construct a typical ejective fold structure [54]. The geological and tectonic framework map and a schematic geologic cross-section of the study area are shown in Figure 8a,b, respectively [55].

4.3. Description of Landslides

In the study area, the accurate sizes and shapes of previously investigated landslides can be extracted from the Headquarters of Prevention and Control of Geo-Hazards in Area of Three Gorges Reservoir [56]. In addition, high-resolution aerial photographs are used to detect neogenic landslides which are caused by the impoundment of the Three Gorges Project from 2003, while historical and literature data are employed to identify previous landslides, which were activated during Holocene and/or Pleistocene age, before the impoundment of the Three Gorges Project. In this work, 233 landslides were mapped in the study area.
Note that we cannot obtain terrain data under the Yangtze River, since there are no such information recorded in topographic maps or Advanced Spaceborne Thermal Emission and Reflection Radiometer (ASTER) G-DEM data. As a result, DEM values always vary greatly at the junction between both sides and surface of the Yangtze River, which influences the environmental factors produced by the DEM data. Therefore, we excluded the Yangtze River from the study area. For prediction, computing units are automatically obtained from high-quality digital terrain models (DTMs) by the slope-units method, which is used to partition the territory into hydrological regions between drainage and divide lines [57]. In this work, our study area is divided into 1909 slope-units, including 416 for landslides with total areas of 24.06 km2, covering 4.36% of the study area. It can be observed from Figure 7c that the sizes of landslides in this area are very different. For instance, the Fuma landslide with an area of approximately 1.12 km2 is the biggest landslide, while the smallest Xianjia 6 group landslide has an area of 3539.77 m2.

4.4. Environmental Factors of Landslides

In this work, ancillary data used for extraction of environmental factors are the following:
  • High-resolution aerial photographs;
  • 1:50,000-Scale geological maps [55];
  • ASTER G-DEM data with a spatial resolution of 30 m;
  • Landsat-8 OLI+ sensor data, acquired on 24 February 2013, with the Path/Row number of 127/39 and its spatial resolution of 30 m for the extraction of land-use and calculation of Normalized Difference Vegetable Index (NDVI) and Normalized Difference Water Index (NDWI);
  • Precipitation and seismic data from the China Meteorological Administration and the China Earthquake Administration for obtaining the precipitation and seismic factors.
Many researchers have verified the correlations between various environmental factors and landslide occurrence [58]. Based on these contributions and the characteristics of the study area, 29 environmental factors are selected to predict the potential distribution of landslides, including geomorphological, geological, hydrological, land cover, meteorological and geophysical factors. The selected environmental factors and their original values are listed in Table 1. In particular, the classification for the bedding structure is shown in Table 3. This factor is based on the topography bedding intersection angle (TOBIA) index [59] using the slope aspect, slope angle, bed dip direction and bed dip angle in space. In addition, the numbers of landslides corresponding to different bedding structures are demonstrated in Figure 9. From this figure, landslide failure can be caused by any type of slope in Figure 9. It should be mentioned that there are many horizontal strata landslides in the study area [60]. Since the formation mechanism of this type of landslides is very complicated and beyond the scope of this article, the gently dipping structure is not addressed in this work. Meanwhile, the figure depicts that there are very strong relationships between the different types of slope and the occurrence of landslides. Therefore, this factor is an important indicator of landslide and should be taken into account for prediction.
It is known that the slope-units method is different from the grid-cells one, because the former is irregular, which means that the resultant areas by the slope-units method are different from each other. Therefore, the first problem of the slope-units method is that how to assign a normalized value to each slope-unit. If the original value of an environmental factor in Table 4 is a continuous variable, such as elevation, slope angle, terrain surface convexity and so on, the mean value of this factor is computed as the normalized value of the corresponding slope-unit, while if the original value of an environmental factor is a discrete variable, such as slope form, lithology, bedding structure and land-use, the most frequently occurring value of this factor is used as the value of this slope-unit. By using this idea, the 1909 slope-units are assigned to a unique value of each factor. To obtain landslide susceptibility of the study area, this value is used in all prediction models in this work.

5. Results

5.1. Experimental Results of The GWR-PSO-SVM Model

As mentioned in Section 3.3, the classical PPMCC is used to weaken the correlations of the selected environmental factors and T1 = 0.5. For simplicity, correlations of geomorphological and hydrological factors are listed in Table 5 and Table 6 and 10 factors are excluded for all the models used here. As a result, the remaining 19 environmental factors are relatively independent and can be further screened based on their importance values ranging from 0 to 0.205, as illustrated in Figure 10, obtained using SPSS Clementine 12 software (IBM, Armonk, NY, USA). To this end, we set T2 = 0.02 and exclude the environmental factor whose importance value is less than T2. Finally, 12 environmental factors are selected for the construction of the coupling model, i.e., catchment slope, distance from drainage, NDVI, bedding structure, slope angle, topographic wetness index, precipitation, lithology, NDWI, vertical distance to channel network, land-use and elevation.
According to the selection criterion mentioned in Section 3.2, the most important environmental factors, i.e., catchment slope, distance from drainage and NDVI, are selected as the regional division factors, whose GWR coefficients are obtained by exploiting an adaptive bi-square kernel and AIC in the GWR method. The GWR coefficient values of catchment slope are shown in Figure 11. It can be easily observed from the figure that different clusters with respect to GWR are spatially developed. Based on the relationship between GWR and spatial autocorrelation mentioned in Section 1, we can easily infer that the GWR coefficients in each cluster are very close. Consequently, spatial dependency are greatly reduced if each cluster is considered as a spatial variable. Therefore, it is possible that the study area can be partitioned into different prediction regions while spatial autocorrelations are very limited.
In this work, we set N = 3, i.e., these selected environmental factors are clustered into three classes by the natural breaks method and the corresponding classification maps are shown in Figure 12a–c. For convenience, the slope-unit without landslide is named as the non-landslide slope-unit, while the slope-unit including landslide is named as the landslide slope-unit. The result of simple superposition is shown in Figure 13a. According to the three rules for merging regions mentioned in Section 3.2, the study area is finally divided into 34 prediction regions by superposing all classification maps. For simplicity, each prediction region is assigned to a unique label, as shown in Figure 13b. It can be observed from this figure that 25 regions contain landslides in the study area. The numbers of the slope-units and the landslide slope-unit are listed in Table 7.
For the GWR-PSO-SVM prediction model, all of prediction regions must be sampled as input variables. For each prediction region in Figure 13b, the label of the landslide slope-unit is assigned as “1”, while the label of the non-landslide slope-unit is assigned as “0”. In our experiment, we use the same number of landslide slope-units and non-landslide slope-units in each prediction region to form training and verification samples. It can be observed from Figure 13 that the total number of non-landslide slope-units in each prediction region is always more than that of the landslide slope-units. Therefore, all of the landslide slope-units and the same number of the randomly selected non-landslide slope-units form the required samples. Meanwhile, the proposed GWR-PSO-SVM model is a local model, which generates the optimal C and γ of the SVM model for each prediction region by using the PSO algorithm, as shown in Table 8. It should be noted that the prediction regions without landslides are not included in this table. Meanwhile, we perform the SVM classifier to estimate the likelihood that each slope-unit contains the existing landslides and demonstrate the corresponding probability maps in Figure 14. The probability value in the map ranging from 0% to 100% represents the different degrees of landslide susceptibility.

5.2. Methods to Assess Models Performance

To objectively evaluate the performance of the models considered, three methods are utilized. The first measure is overall prediction accuracy, which is used to evaluate prediction correctness and can be defined as:
p = a + b S 100 %
where a and b are the numbers of correctly predicted landslide and non-landslide slope-units in the landslide susceptibility maps, respectively. S is the total number of slope-units in the study area. According to (11), this measure can be appropriately applied to evaluate the global models, such as the SVM, PSO-SVM, RS-SVM models, by taking into account the entire study area. While it is used for the GWR-based models, the measure can be computed in each prediction region. In this work, the final measure of overall prediction accuracy is defined as follows:
p = i = 1 n p r ( a i + b i ) i = 1 n p r S i 100 %
where i = 1,2,…,npr (npr is total number of prediction regions), ai and bi are the numbers of correctly predicted landslide and non-landslide slope-units in the ith prediction region, respectively. Si is the number of slope-units involved in the current prediction region.
The second measure is exploited to evaluate prediction accuracy of landslide areas in each class of landslide susceptibility maps obtained by the mentioned models according to the distribution of our study area. This measure is named as class-specific accuracy and is defined as follows:
p j = A j B j 100 %
where j = 1,2,⋯,M (M is total number of landslide susceptibility zones), Aj and Bj are the numbers of landslide slope-units and total slope-units in the jth landslide susceptibility zone, respectively. To perform this measure, our study area is classified into M landslide susceptibility zones. In this work, the fixed interval method is used to achieve this aim and it is based on previous studies to segment study areas by the predefined thresholds, which is widely used for comparison of multiple models [7,46,61].
The third measure is the classical receiver operation characteristic (ROC) curve and its area under curve (AUC). In a ROC curve the true positive rate (sensitivity) is plotted in function of the false positive rate (100-specificity) for different cut-off points. Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. A test with perfect discrimination (no overlap in the two distributions) has a ROC curve that passes through the upper left corner (100% sensitivity, 100% specificity). Therefore, the closer a curve is to the upper left corner, the better are the prediction results [62].

5.3. Comparison with Further Models

To better demonstrate the performance of our model, several models are compared to our method, including: (1) the SVM model, in which the study area are globally used for sampling and prediction; (2) the PSO-SVM model, in which the PSO algorithm is used to obtain the optimal C and γ to improve prediction accuracies; (3) the landslide susceptibility mapping method based on rough set (RS) and SVM proposed by Peng et al. [46]. RS theory is an effective tool introduced by Pawlak [63] and discussed in many review papers [64,65,66,67,68,69,70]. This technique can deal with vagueness and uncertainty information and identify cause-effect relationships in databases as a form of data mining and knowledge discovery [46,63,71]. Meanwhile, it has been widely used in various disciplines of science [72], including remote sensing [73], geographic information science [74], and landslide susceptibility mapping [71], etc. In the work of [46], it was employed to select key environmental factors for landslide prediction; (4) the GWR-SVM model, which is a local model and similar to our coupling model, without the PSO step to obtain the optimal C and γ.
For a fair comparison, the same mapping unit and original environmental factors are used for all models used here. It should be noted that the RS-SVM model is different from the other models due to the fact that its input environmental factors are determined based on the RS theory after the PPMCC analysis. In our experiments, all of the remaining 12 factors are used for input variables for the SVM, PSO-SVM, GWR-SVM and our models, while 14 factors are selected based on the RS theory in the RS-SVM model, excluding land-use, mid-slope position, plane curvature, stream power index, terrain surface convexity from the remaining 19 factors.
It is well-known that the selection of samples for training and verification is a key step for the SVM prediction model. As mentioned above, the classical SVM, PSO-SVM and RS-SVM models can be considered as global ones due to the fact that the entire study area is taken into account for selecting samples, i.e., all of the landslide slope-units in the study area and the same number of the randomly selected non-landslide slope-units are used for training their respective SVM models, while all of the slope-units in the study area are utilized for verification. Nevertheless, the selection scheme of the remaining GWR-based models is performed for each prediction region, instead of the entire study area, as mentioned in Section 5.1. Therefore, the sample size of each model in this work is measured using the number of slope-units in the study area or each prediction region. Table 9 depicts the training and verification sample sizes of all the models. In addition, the PSO algorithm is used for the PSO-SVM and GWR-PSO-SVM models to obtain the optimal C and γ to improve prediction performance of the SVM model.
To make probability maps more readable, we can divide probability values by using fixed interval method in ArcGIS software into five susceptibility categories, i.e., very low, low, medium, high and very high, corresponding thresholds are fixed to 0.1, 0.35, 0.75 and 0.9, respectively, as shown in Figure 15. It can be observed from Figure 15 that all of the models can achieve the purpose of landslide prediction. Meanwhile, the very high-susceptibility zones are apparently mapped in the main urban area of Wanzhou district in all the susceptibility maps, which accords with the fact that the previously investigated landslides are mainly distributed in this area. The distribution of high and very high-susceptibility zones is greatly different for each model.
For instance, most of the previously investigated landslides are located in high or very high-susceptibility zones in the maps of the SVM, RS-SVM and GWR-SVM models. However, a large number of slope-units are unreliably classified by these models as high or very high-susceptibility zones as well. Landslides are typically a minority class in the study area, the PSO algorithm always results in local optima of the SVM model, when it is applied to the entire study area. As a consequence, the previously investigated landslides in the southwest of the study area cannot effectively be predicted by the PSO-SVM model. In contrast, the map by our model is consistent to the ground truth of landslide distribution. Although the PSO algorithm is used in our method to optimize the parameters in the SVM model, the division of our study area into prediction regions with appropriate sizes can greatly overcome trapping in local optimum. The high and very high-susceptibility zones mainly concentrate in the previously investigated landslide areas, while most of non-landslide areas are classified as low and very low-susceptibility zones, which guarantee the reliability of prediction results of landslide susceptibility. The overall accuracies of landslide susceptibility mapping by all the models used here are listed in Table 10.
In this table, the item of “Correct” indicates the number of slope-units that are correctly predicted in prediction regions, while the item of “Total” means the number of slope-units in prediction regions. It should be noted that this “total” number in the GWR-SVM and GWR-PSO-SVM models are calculated using the prediction regions including landslides. It is obvious that the GWR-PSO-SVM model can achieve the best prediction accuracy of 91.10%, which is 7.8%–19.1% higher than the traditional SVM-based models. To further compare the performance of all the models, the class-specific accuracies are shown in Figure 16. It can be clearly seen that the class-specific accuracy of the very high-susceptibility zone achieved by our model is highest (96.27%) when compared with the other models, which means that our model can detect the very high-susceptibility zones mainly including the previously investigated landslides.
The ROC curves of all the methods are plotted in Figure 17. It is known that the closer the ROC curve is to the upper left corner, the higher the overall accuracy of the test is. As can be observed from Figure 17, we can obtain similar conclusions as for the two previous evaluation measures, i.e., the GWR-PSO-SVM model can achieve the best prediction result. Meanwhile, the ROC plots of the GWR-SVM and the RS-SVM models are pretty close to each other. Since the PSO algorithm is not very robust when it is applied to the whole study area, the ROC plot of the PSO-SVM model is not continuous and is close to the upper left corner when the value (of the 1-specificity) is 0.2, but worse than the RS-SVM model, GWR-SVM and our models when the value is larger than 0.2. In addition, the corresponding AUC is listed in Table 11. The larger the value of AUC, the better the performance of the prediction model. As shown in this table, our model can produce the largest area of 0.971, when compared with the other models.
It should be noted that there are a few non-landslide regions in the prediction region map (Figure 12b), since landslides are typically a minority class in the study area. To compare the performance of our model with the global models, we assume in this work that the overall prediction accuracies of these non-landslide regions are 100%, which may improve the overall accuracy of the entire study area. Meanwhile, our experiments not reported here confirm that the AUC value of our model can still reach 0.962 by removing these non-landslide regions from the study area. Furthermore, all the prediction models were applied to Zigui to Badong section in the Three Gorges Reservoir for landslide susceptibility mapping. The experimental results demonstrated that the GWR-PSO-SVM model can obtain the best prediction result as well and the AUC value of 0.965, which is highest among all the models. Therefore, the universality of our model can be validated. Finally, to objectively compare our model with the other models, we select the same number of landslide slope-units and non-landslide slope-units in each prediction region. Although the number of training samples is relatively small in certain prediction regions, the influence on the overall prediction accuracy is very limited.

6. Discussion

6.1. Impact of Environmental Factors

It should be noted that the global and regional prediction results of the study area are always different, mainly due to two reasons. The first one is the prediction model. Since the SVM model has been used as a universal model and can obtain satisfactory results, it is exploited by all the models used here for landslide susceptibility mapping. The second one is the impact of environmental factors. There are several crucial environmental factors for landslide prediction, such as elevation, slope angle and so on. However, the most crucial factors are different in different parts of the study area. For instance, the environmental factor of distance from drainage is greatly significant for landslide failures in the area along the Yangtze River, while slope angle may be the most important environmental factor in the areas far away from the Yangtze River. Therefore, the introduction of the GWR technique into landslide susceptibility mapping may avoid these two problems and improve the prediction accuracy. The importance values of all the environmental factors in each prediction region, obtained using SPSS Clementine 12 software, are displayed in Figure 18. It can be observed that the importance values of the final 12 environmental factors produced in Section 5.1 at each prediction region are different. Meanwhile, in each prediction region, the rank of each environmental factor in terms of the important value is greatly different.

6.2. Influence of Regions Number

To demonstrate the impact of the performance of segmentation of the study area, the resultant segmentation maps, with respect to different values of N from 2 to 4, are shown in Figure 19. In Figure 19a, the study area is divided into 10 prediction regions when N = 2 which may avoid the problem that the importance rank of each environmental factor is not the same in different prediction regions. However, the impact of each environmental factor in different spatial positions is not taken into account.
For instance, all the prediction regions are produced distributing from the Yangtze River to boundaries of the study area, but the importance rank of each environmental factor may greatly change in different parts of each prediction region, which cannot be carefully considered in prediction models if prediction regions are very large. In Figure 19c, the study area is segmented into 65 prediction regions if N = 4. In this way, the slope units may be very few in prediction regions. As a consequence, the landslide and non-landslide slope units in each prediction region are not enough to constitute required samples, which influences landslide prediction accuracies. In contrast, our study area in this work is divided into 34 prediction regions by choosing N = 3 and different impacts of environmental factors in these regions are effectively utilized into prediction models. In addition, the size of each prediction region is appropriate for obtaining the required samples, as shown in Figure 19b.

6.3. Model Sensitivity

To evaluate the sensitivity of the proposed model to the number of prediction and verification samples, five prediction regions, which have the most landslide slope-units, are selected to obtain ROC curves of the prediction performance by choosing five different percentages of required sample sets, i.e., 20%, 40%, 60%, 80% and 100%. The corresponding prediction regions in our study area and their ROC curves are depicted in Figure 20. In general, the higher percentage of the required samples we choose, the better the prediction performance, i.e., the prediction accuracy of our model is highest when using all of the required samples, while it is lowest when only 20% of the required samples are used in our model. The prediction results are greatly determined by the selection of samples due to the complexity of landslides in the study area. If training samples are very small, we cannot extract valuable information from environmental factors, which makes it difficult for our model to guarantee accuracies of landslide prediction. In addition, the selection of the required samples in each prediction region results in fewer training samples for prediction. As a result, the prediction accuracy of our model is lower as the training samples are reduced.

7. Conclusions

In this paper, an effective PSO-SVM method based on the GWR technique is presented for landslide susceptibility mapping at a local scale by integrating multisource data of the Wanzhou district in the middle of the Three Gorges Reservoir, China. It has been reported that landslide events occurred in the last three years in the main urban area of the Wanzhou district. In this work, a GWR algorithm is used in our model to segment the study area into a series of prediction regions with appropriate sizes by clustering slope units. Then, a PSO-SVM prediction model is applied to each prediction region for landslide susceptibility mapping. This allows the proposed GWR-PSO-SVM model can obtain accurate landslide susceptibility maps at a regional scale. Experimental results demonstrate that coupling different models as in the GWR-PSO-SVM model can achieve better prediction performance, when compared to the traditional SVM-based models. Meanwhile, these landslide prediction models are comprehensively evaluated using three objective measures including the overall prediction accuracy, the landslide susceptibility class-specific accuracies, and the ROC curves and AUC values. We can draw the following conclusions: (1) The GWR-PSO-SVM model can obtain the best overall accuracy of 91.10%; (2) The GWR-PSO-SVM model can achieve the highest class-specific accuracy of 96.27% with respect to the very high-susceptibility zones, which are mainly covered with the previously investigated landslides; (3) The GWR-PSO-SVM model can achieve a more reliable ROC curve and a higher AUC value of 0.971. Therefore, our model can achieve superior prediction performance to the traditional prediction models. In future, a further improvement can be achieved by selecting more reasonable segmentation factors and performing segmentation postprocessing.

Acknowledgments

This research is supported by the National Natural Science Foundation of China (61271408), the National High-tech R&D Program of China (Grant 2012AA121303) and Key Laboratory of Precise Engineering and Industry Surveying, National Administration of Surveying, Mapping and Geoinformation (Project No.: PF2012-21). We are grateful to the Headquarters of Prevention and Control of Geo-Hazards in Area of the Three Gorges Reservoir for providing data and material. We also thank the editor and anonymous referees for their comments.

Author Contributions

Xianyu Yu and Yi Wang implemented all the landslide susceptibility prediction modeling, conducted the experiments and finished the first draft. Yi Wang supervised the research and contributed to the editing and review of the manuscript. Ruiqing Niu and Youjian Hu discussed some key issues on our model and provided useful suggestions for improving our work.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Cruden, D.M.; Varnes, D.J. Landslides: Investigation and Mitigation; Chapter 3; Transportation Research Board Special Report: Washington, DC, USA, 1996. [Google Scholar]
  2. Liu, C.; Liu, Y.; Wen, M.; Li, T.; Lian, J.; Qin, S. Geo-hazard initiation and assessment in the three gorges reservoir. In Landslide Disaster Mitigation in Three Gorges Reservoir, China; Springer: Berlin, Germany, 2009; pp. 3–40. [Google Scholar]
  3. Brabb, E.E. The world landslide problem. In Episodes; US International Union of Geological Sciences (IUGS): Bangalore, India, 1991; Volume 14, pp. 52–61. [Google Scholar]
  4. Wan, S.; Chang, S.H. Combined particle swarm optimization and linear discriminant analysis for landslide image classification: Application to a case study in Taiwan. Environ. Earth Sci. 2014, 72, 1453–1464. [Google Scholar] [CrossRef]
  5. Regmi, N.R.; Giardino, J.R.; Vitek, J.D. Assessing susceptibility to landslides: Using models to understand observed changes in slopes. Geomorphology 2010, 122, 25–38. [Google Scholar] [CrossRef]
  6. Barredo, J.E.I.; Benavides, A.; Herv, A.S.J.; van Westen, C.J. Comparing heuristic landslide hazard assessment techniques using GIS in the Tirajana basin, Gran Canaria Island, Spain. Int. J. Appl. Earth Obs. Geoinf. 2000, 2, 9–23. [Google Scholar] [CrossRef]
  7. Dai, F.; Lee, C. Landslide characteristics and slope instability modeling using GIS, Lantau Island, Hong Kong. Geomorphology 2002, 42, 213–228. [Google Scholar] [CrossRef]
  8. Ohlmacher, G.C.; Davis, J.C. Using multiple logistic regression and GIS technology to predict landslide hazard in northeast Kansas, USA. Eng. Geol. 2003, 69, 331–343. [Google Scholar] [CrossRef]
  9. Ayalew, L.; Yamagishi, H. The application of gis-based logistic regression for landslide susceptibility mapping in the Kakuda-Yahiko Mountains, Central Japan. Geomorphology 2005, 65, 15–31. [Google Scholar] [CrossRef]
  10. Lee, S.; Ryu, J.H.; Min, K.; Won, J.S. Landslide susceptibility analysis using GIS and artificial neural network. Earth Surf. Process. Landforms 2003, 28, 1361–1376. [Google Scholar] [CrossRef]
  11. Gomez, H.; Kavzoglu, T. Assessment of shallow landslide susceptibility using artificial neural networks in Jabonosa River Basin, Venezuela. Eng. Geol. 2005, 78, 11–27. [Google Scholar] [CrossRef]
  12. Wang, H.; Sassa, K. Rainfall-induced landslide hazard assessment using artificial neural networks. Earth Surf. Process. Landforms 2006, 31, 235–247. [Google Scholar] [CrossRef]
  13. Melchiorre, C.; Matteucci, M.; Azzoni, A.; Zanchi, A. Artificial neural networks and cluster analysis in landslide susceptibility zonation. Geomorphology 2008, 94, 379–400. [Google Scholar] [CrossRef]
  14. Yao, X.; Tham, L.G.; Dai, F. Landslide susceptibility mapping based on support vector machine: A case study on natural slopes of Hong Kong, China. Geomorphology 2008, 101, 572–582. [Google Scholar] [CrossRef]
  15. Marjanović, M.; Kovačević, M.; Bajat, B.; Voženílek, V. Landslide susceptibility assessment using SVM machine learning algorithm. Eng. Geol. 2011, 123, 225–234. [Google Scholar] [CrossRef]
  16. Ballabio, C.; Sterlacchini, S. Support vector machines for landslide susceptibility mapping: The Staffora River Basin case study, Italy. Math. Geosci. 2012, 44, 47–70. [Google Scholar] [CrossRef]
  17. Xu, C.; Dai, F.; Xu, X.; Lee, Y.H. GIS-based support vector machine modeling of earthquake-triggered landslide susceptibility in the Jianjiang River watershed, China. Geomorphology 2012, 145, 70–80. [Google Scholar] [CrossRef]
  18. Erener, A.; Düzgün, H.S.B. Improvement of statistical landslide susceptibility mapping by using spatial and global regression methods in the case of More and Romsdal (Norway). Landslides 2010, 7, 55–68. [Google Scholar] [CrossRef]
  19. Sabokbar, H.F.; Roodposhti, M.S.; Tazik, E. Landslide susceptibility mapping using geographically-weighted principal component analysis. Geomorphology 2014, 226, 15–24. [Google Scholar] [CrossRef]
  20. San, B.T. An evaluation of svm using polygon-based random sampling in landslide susceptibility mapping: The Candir Catchment area (Western Antalya, Turkey). Int. J. Appl. Earth Obs. 2014, 26, 399–412. [Google Scholar] [CrossRef]
  21. Yao, X.; Zhang, Y.; Zhou, N.; Guo, C.; Yu, K.; Li, L.J. Application of two-class SVM applied in landslide susceptibility mapping. In Project Planning and Project Success: The 25% Solution; Taylor & Francis Group: England, UK, 2014; p. 203. [Google Scholar]
  22. Huang, C.-L.; Dun, J.-F. A distributed PSO–SVM hybrid system with feature selection and parameter optimization. Appl. Soft Comput. 2008, 8, 1381–1391. [Google Scholar] [CrossRef]
  23. Zhao, H.; Yin, S. Geomechanical parameters identification by particle swarm optimization and support vector machine. Appl. Math. Model. 2009, 33, 3997–4012. [Google Scholar] [CrossRef]
  24. Ren, F.; Wu, X.; Zhang, K.; Niu, R. Application of wavelet analysis and a particle swarm-optimized support vector machine to predict the displacement of the shuping landslide in the Three Gorges, China. Environ. Earth Sci. 2015, 73, 4791–4804. [Google Scholar] [CrossRef]
  25. Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the IEEE International of First Conference on Neural Networks, Perth, Australia, 27 November 1995–1 December 1995; pp. 1942–1948.
  26. Holland, J.H. Genetic algorithms and the optimal allocation of trials. SIAM J. Comput. 1973, 2, 88–105. [Google Scholar] [CrossRef]
  27. Hassan, R.; Cohanim, B.; De Weck, O.; Venter, G. A comparison of particle swarm optimization and the genetic algorithm. In Proceedings of the 46th AIAA Multidisciplinary Design Optimization Specialist Conference, Austin, Texas, 18–21 April 2005; pp. 18–21.
  28. Brunsdon, C.; Fotheringham, A.S.; Charlton, M.E. Geographically weighted regression: A method for exploring spatial nonstationarity. Geogr. Anal. 1996, 28, 281–298. [Google Scholar] [CrossRef]
  29. Fotheringham, A.S.; Charlton, M.; Brunsdon, C. The geography of parameter space: An investigation of spatial non-stationarity. Int. J. Geogr. Inf. Syst. 1996, 10, 605–627. [Google Scholar] [CrossRef]
  30. Schabenberger, O.; Gotway, C.A. Statistical Methods for Spatial Data Analysis; CRC Press: Boca Raton, FL, USA, 2004. [Google Scholar]
  31. Fischer, M.M.; Getis, A. Handbook of applied spatial analysis: Software tools, methods and applications. Springer Science & Business Media: Berlin, Germany, 2009. [Google Scholar]
  32. Nakaya, T. Local spatial interaction modelling based on the geographically weighted regression approach. In Modelling Geographical Systems; Springer: Dordrecht, The Netherlands, 2002; pp. 45–69. [Google Scholar]
  33. Fotheringham, A.S.; Brunsdon, C.; Charlton, M. Geographically Weighted Regression: The Analysis of Spatially Varying Relationships; John Wiley & Sons: New York, NY, USA, 2003. [Google Scholar]
  34. Feuillet, T.; Coquin, J.; Mercier, D.; Cossart, E.; Decaulne, A.; Jónsson, H.P.; Sæmundsson, B. Focusing on the spatial non-stationarity of landslide predisposing factors in Northern Iceland: Do paraglacial factors vary over space? Prog. Phys. Geogr. 2014, 38, 354–377. [Google Scholar] [CrossRef]
  35. Brunsdon, C.; Fotheringham, S.; Charlton, M. Spatial nonstationarity and autoregressive models. Environ. Plann. A 1998, 30, 957–973. [Google Scholar] [CrossRef]
  36. Wheeler, D.C. Geographically weighted regression. In Handbook of Regional Science; Springer: Berlin, Germany, 2014; pp. 1435–1459. [Google Scholar]
  37. Fotheringham, A.S.; Charlton, M.; Brunsdon, C. Measuring spatial variations in relationships with geographically weighted regression. In Recent Developments in Spatial Analysis; Springer: Berlin, Germany, 1997; pp. 60–82. [Google Scholar]
  38. Chalkias, C.; Kalogirou, S.; Ferentinou, M. Landslide susceptibility, Peloponnese peninsula in south Greece. J. Maps 2014, 10, 211–222. [Google Scholar] [CrossRef]
  39. Fotheringham, A.S.; Charlton, M.; Brunsdon, C. Geographically weighted regression: A natural evolution of the expansion method for spatial data analysis. Environ. Plan. A 1998, 30, 1905–1927. [Google Scholar] [CrossRef]
  40. Celik, M.; Kazar, B.M.; Shekhar, S.; Boley, D. Parameter Estimation for the Spatial Autoregression Model: A Rigorous Approach. Available online: http://www-users.cs.umn.edu/~boley/publications/papers/NASA06.pdf (accessed on 23 February 2016).
  41. Cleveland, W.S. Robust locally weighted regression and smoothing scatterplots. J. Am. Stat. Assoc. 1979, 368, 829–836. [Google Scholar] [CrossRef]
  42. Hirotugu, A. A new look at the statistical model identification. Autom. Control Comput. Sci. 1974, 6, 716–723. [Google Scholar]
  43. Vapnik, V.N. The Nature of Statistical Learning Theory; Springer New York: New York, NY, USA, 1995. [Google Scholar]
  44. Barakat, N.; Bradley, A.P. Rule extraction from support vector machines: A review. Neurocomputing 2010, 74, 178–190. [Google Scholar] [CrossRef]
  45. Mountrakis, G.; Im, J.; Ogole, C. Support vector machines in remote sensing: A review. ISPRS J. Photogramm. Remote Sens. 2011, 66, 247–259. [Google Scholar] [CrossRef]
  46. Peng, L.; Niu, R.; Huang, B.; Wu, X.; Zhao, Y.; Ye, R. Landslide susceptibility mapping based on rough set theory and support vector machines: A case of the Three Gorges Area, China. Geomorphology 2014, 204, 287–301. [Google Scholar] [CrossRef]
  47. Suykens, J.A.; Vandewalle, J. Least squares support vector machine classifiers. Neural Process. Lett. 1999, 9, 293–300. [Google Scholar] [CrossRef]
  48. Kennedy, J. Particle swarm optimization. In Encyclopedia of Machine Learning; Springer: Berlin, Germany, 2011; pp. 760–766. [Google Scholar]
  49. Xiao, C.; Hao, K.; Ding, Y. The bi-directional prediction of carbon fiber production using a combination of improved particle swarm optimization and support vector machine. Materials 2014, 8, 117–136. [Google Scholar] [CrossRef]
  50. Zhang, L.; Bi, H.; Cheng, P.; Davis, C.J. Modeling spatial variation in tree diameter–height relationships. For. Ecol. Manag. 2004, 189, 317–329. [Google Scholar] [CrossRef]
  51. Miller, H.J. Tobler’s first law and spatial analysis. Ann. Assoc. Am. Geogr. 2004, 94, 284–289. [Google Scholar] [CrossRef]
  52. Jenks, G.F. The data model concept in statistical mapping. Int. Yearb. Cartogr. 1967, 7, 186–190. [Google Scholar]
  53. Li, J.; Xie, S.; Kuang, M. Geomorphic evolution of the yangtze gorges and the time of their formation. Geomorphology 2001, 41, 125–135. [Google Scholar] [CrossRef]
  54. Zhou, Q.; Lv, Z.; Ma, Z.; Zhang, Y.; Wang, H. Barrier belt division based on rs and gis in the three gorges reservoir area—A case of Wanzhou district. Procedia. Environ. Sci. 2011, 10, 1257–1263. [Google Scholar] [CrossRef]
  55. Hubei Province Geological Survey. Geological Map of Zigui and Badong County (1:50,000); Hubei Province Geological Survey Press: Wuhan, China, 1997. [Google Scholar]
  56. Headquarters of Prevention and Control of Geo-Hazards in Area of Three Gorges Reservoir; 1:10,000 Geological Hazard Mapping Database: Yichang, China, 2011.
  57. Guzzetti, F.; Carrara, A.; Cardinali, M.; Reichenbach, P. Landslide hazard evaluation: A review of current techniques and their application in a multi-scale study, central Italy. Geomorphology 1999, 31, 181–216. [Google Scholar] [CrossRef]
  58. Liu, J.G.; Mason, P.J.; Clerici, N.; Chen, S.; Davis, A.; Miao, F.; Deng, H.; Liang, L. Landslide hazard assessment in the three gorges area of the Yangtze River using ASTER imagery: Zigui–Badong. Geomorphology 2004, 61, 171–187. [Google Scholar] [CrossRef]
  59. Meentemeyer, R.K.; Moody, A. Automated mapping of conformity between topographic and geological surfaces. Comput. Geosci. 2000, 26, 815–829. [Google Scholar] [CrossRef]
  60. Kunlong, Y.; Wenxing, J.; Yang, W.; Chunmei, Z.; Changqian, M.; Liling, L.; Lingling, Y.; Yiping, W. The Research of Three Gorges Reservoir in Wanzhou Area of Nearly Horizontal Strata Landslide Formation Mechanism and Control Engineering; China University of Geosciences Press: Wuhan, China, 2007; pp. 39–65. [Google Scholar]
  61. Chen, T.; Niu, R.; Du, B.; Wang, Y. Landslide spatial susceptibility mapping by using GIS and Remote Sensing techniques: A case study in Zigui county, the Three Georges Reservoir, China. Environ. Earth Sci. 2015, 73, 5571–5583. [Google Scholar] [CrossRef]
  62. Zweig, M.H.; Campbell, G. Receiver-operating characteristic (ROC) plots: A fundamental evaluation tool in clinical medicine. Clin. Chem. 1993, 39, 561–577. [Google Scholar] [PubMed]
  63. Pawlak, Z. Rough sets. Int. J. Comput. Inf. Sci. 1982, 11, 341–356. [Google Scholar] [CrossRef]
  64. Liu, J.; Zeng, Z.; Liu, H.; Wang, H. A rough set approach to analyze factors affecting landslide incidence. Comput. Geosci. 2011, 37, 1311–1317. [Google Scholar] [CrossRef]
  65. Komorowski, J.; Pawlak, Z.; Polkowski, L.; Skowron, A. Rough Sets: A Tutorial. Available online: http://eecs.ceas.uc.edu/~mazlack/dbm.w2011/Komorowski.RoughSets.tutor.pdf (accessed on 23 February 2016).
  66. Pawlak, Z. Rough Sets: Theoretical Aspects of Reasoning about Data; Springer Science & Business Media: Berlin, Germany, 2012. [Google Scholar]
  67. Pawlak, Z.; Sowinski, R. Rough set approach to multi-attribute decision analysis. Eur. J. Oper. Res. 1994, 72, 443–459. [Google Scholar] [CrossRef]
  68. Kryszkiewicz, M. Rough set approach to incomplete information systems. Inf. Sci. 1998, 112, 39–49. [Google Scholar] [CrossRef]
  69. Swiniarski, R.W.; Skowron, A. Rough set methods in feature selection and recognition. Pattern. Recognit. Lett. 2003, 24, 833–849. [Google Scholar] [CrossRef]
  70. Davvaz, B. Roughness based on fuzzy ideals. Inf. Sci. 2006, 176, 2417–2437. [Google Scholar] [CrossRef]
  71. Gorsevski, P.V.; Jankowski, P. Discerning landslide susceptibility using rough sets. Comput. Environ. Urban Syst. 2008, 32, 53–65. [Google Scholar] [CrossRef]
  72. Thangavel, K.; Pethalakshmi, A. Dimensionality reduction based on rough set theory: A review. Appl. Soft Comput. 2009, 9, 1–12. [Google Scholar] [CrossRef]
  73. Pan, X.; Zhang, S.; Zhang, H.; Na, X.; Li, X. A variable precision rough set approach to the remote sensing land use/cover classification. Comput. Geosci. 2010, 36, 1466–1473. [Google Scholar] [CrossRef]
  74. Leung, Y.; Fung, T.; Mi, J.S.; Wu, W.Z. A rough set approach to the discovery of classification rules in spatial data. Int. J. Geogr. Inf. Sci. 2007, 21, 1033–1058. [Google Scholar] [CrossRef]
Figure 1. Illustration of the SVM principle.
Figure 1. Illustration of the SVM principle.
Ijerph 13 00487 g001
Figure 2. The process of the PSO algorithm.
Figure 2. The process of the PSO algorithm.
Ijerph 13 00487 g002
Figure 3. The flowchart of the PSO-SVM model.
Figure 3. The flowchart of the PSO-SVM model.
Ijerph 13 00487 g003
Figure 4. Flow-chart of our proposed method.
Figure 4. Flow-chart of our proposed method.
Ijerph 13 00487 g004
Figure 5. The superposition of classification maps of three environmental factors A1, A2 and A3.
Figure 5. The superposition of classification maps of three environmental factors A1, A2 and A3.
Ijerph 13 00487 g005
Figure 6. The merging of prediction regions. (a) The merging of prediction regions which separate landslides; (b) the merging of adjacent small regions including landslides, which are far from other landslides area; (c) the preservation of large regions without landslide.
Figure 6. The merging of prediction regions. (a) The merging of prediction regions which separate landslides; (b) the merging of adjacent small regions including landslides, which are far from other landslides area; (c) the preservation of large regions without landslide.
Ijerph 13 00487 g006aIjerph 13 00487 g006b
Figure 7. Location map of the study area. (a) Site map of the Three Gorges Reservoir; (b) site map of our study area; (c) digital elevation mode (DEM) overlaid with previously investigated landslides. The red hatched regions represent previously investigated landslides in the study area.
Figure 7. Location map of the study area. (a) Site map of the Three Gorges Reservoir; (b) site map of our study area; (c) digital elevation mode (DEM) overlaid with previously investigated landslides. The red hatched regions represent previously investigated landslides in the study area.
Ijerph 13 00487 g007aIjerph 13 00487 g007b
Figure 8. Geological maps of the study area. (a) Geological and tectonic sketch; (b) a schematic geological cross-section.
Figure 8. Geological maps of the study area. (a) Geological and tectonic sketch; (b) a schematic geological cross-section.
Ijerph 13 00487 g008aIjerph 13 00487 g008b
Figure 9. Statistical results of the numbers of landslides corresponding to different bedding structures.
Figure 9. Statistical results of the numbers of landslides corresponding to different bedding structures.
Ijerph 13 00487 g009
Figure 10. Importance values of the remaining 19 environmental factors.
Figure 10. Importance values of the remaining 19 environmental factors.
Ijerph 13 00487 g010
Figure 11. The GWR coefficient map of the catchment slope.
Figure 11. The GWR coefficient map of the catchment slope.
Ijerph 13 00487 g011
Figure 12. The GWR coefficient values and classification maps of environmental factors. (a) Catchment Slope; (b) the distance from drainage; (c) NDVI.
Figure 12. The GWR coefficient values and classification maps of environmental factors. (a) Catchment Slope; (b) the distance from drainage; (c) NDVI.
Ijerph 13 00487 g012
Figure 13. The resultant prediction regions of the study area. (a) The resultant prediction region map after the superposition process; (b) the final prediction region map after the merging process.
Figure 13. The resultant prediction regions of the study area. (a) The resultant prediction region map after the superposition process; (b) the final prediction region map after the merging process.
Ijerph 13 00487 g013
Figure 14. The landslide susceptibility map by the GWR-PSO-SVM model.
Figure 14. The landslide susceptibility map by the GWR-PSO-SVM model.
Ijerph 13 00487 g014
Figure 15. Landslide susceptibility zoning using the fixed interval method. (a) SVM; (b) PSO-SVM; (c) RS-SVM; (d) GWR-SVM; (e) GWR-PSO-SVM.
Figure 15. Landslide susceptibility zoning using the fixed interval method. (a) SVM; (b) PSO-SVM; (c) RS-SVM; (d) GWR-SVM; (e) GWR-PSO-SVM.
Ijerph 13 00487 g015aIjerph 13 00487 g015b
Figure 16. The class-specific accuracies by different prediction models using the fixed interval method.
Figure 16. The class-specific accuracies by different prediction models using the fixed interval method.
Ijerph 13 00487 g016
Figure 17. The ROC curve of all the prediction models.
Figure 17. The ROC curve of all the prediction models.
Ijerph 13 00487 g017
Figure 18. The importance values of the environmental factors in different prediction regions.
Figure 18. The importance values of the environmental factors in different prediction regions.
Ijerph 13 00487 g018
Figure 19. The performance of segmentation with respect to different values of N. (a) N = 2; (b) N = 3; (c) N = 4.
Figure 19. The performance of segmentation with respect to different values of N. (a) N = 2; (b) N = 3; (c) N = 4.
Ijerph 13 00487 g019aIjerph 13 00487 g019b
Figure 20. The ROC curves with different sample sets for the five regions. (a) The prediction region No. 9; (b) the ROC curves and AUCs corresponding to (a); (c) the prediction region No. 25; (d) the ROC curves and AUCs corresponding to (c); (e) the prediction region No. 26; (f) the ROC curves and AUCs corresponding to (e); (g) the prediction region No. 28; (h) the ROC curves and AUCs corresponding to (g); (i) the prediction region No. 28; (j) the ROC curves and AUCs corresponding to (i).
Figure 20. The ROC curves with different sample sets for the five regions. (a) The prediction region No. 9; (b) the ROC curves and AUCs corresponding to (a); (c) the prediction region No. 25; (d) the ROC curves and AUCs corresponding to (c); (e) the prediction region No. 26; (f) the ROC curves and AUCs corresponding to (e); (g) the prediction region No. 28; (h) the ROC curves and AUCs corresponding to (g); (i) the prediction region No. 28; (j) the ROC curves and AUCs corresponding to (i).
Ijerph 13 00487 g020aIjerph 13 00487 g020b
Table 1. Procedures of the PSO-SVM algorithm.
Table 1. Procedures of the PSO-SVM algorithm.
Input: Training and Verification Samples.
Output: The Result of the PSO-SVM Model.
1. 
Initialization parameters: Generate initial particles comprised of C and γ of the SVM model. And set the PSO parameters including number of iterations, population size, maximum iteration number, the learning factors, the initial particle swarm location, the random flight velocity and two random number in range [0,1]. Initial the iteration = 0, and perform the training process from step 3–9 for each particle.
2. 
Data set: Selection the training and verification samples.
3. 
Set iteration n = n + 1.
4. 
SVM model training: Conduct 10-fold cross validation (CV) on the training samples, and calculate the average CV accuracy based on the (C, γ).
5. 
Evaluate its fitness by the average CV accuracy which is obtained in step 4.
6. 
Update the global and local optimal solution according to the result of the fitness evaluation.
7. 
Each particle moves to its new location x i n by velocity V i n according to Equation (10).
8. 
Until this iteration, the local optimal solution of the ith particle p i n , are compared with the new location x i n , the better will be the new p i n in the iteration n+1. And the same way on the p g n , which is the global optimal solution of all particles until iteration n.
9. 
Stop condition checking: If the maximum iterations predefined are met, go to step 3. Otherwise, go to step 10.
10. 
To avoid overtraining, stop training when the iteration has the best CV accuracy.
11. 
Build the SVM model on the verification samples based on the SVM model optimal parameters (C, γ), which are obtained with the stopping iteration determined in the step 10.
12. 
End the training and verification procedure and get the result of the PSO-SVM model.
Table 2. Procedures of the GWR-PSO-SVM algorithm.
Table 2. Procedures of the GWR-PSO-SVM algorithm.
Input: Ancillary Data of the Study Area.
Output: The Landslide Susceptibility Map.
Step 1: Extract environmental factors
Extract environmental factors from ancillary data, including digital elevation models, geological maps, topographical maps and remote sensing images, etc. Note that all data should be resampled to the same spatial resolution.
To each computing unit, a value is assigned to represent its corresponding environmental factor.
Step 2: Environmental factors screening
Calculate the Pearson product-moment correlation coefficient (PPMCC) between any pair of environmental factors and exclude the environmental factors with high correlations. If the PPMCC value is greater than a predefined threshold T1, the corresponding environmental factors are excluded according to the actual situation of the study area and previous research works.
Calculate the importance value in the SVM model for each remaining environmental factor. In this work, the importance values, which are greater than a predefined threshold T2, are preserved as the final ones for the corresponding environmental factors. Finally, these environmental factors are used for the subsequent landslide prediction.
Step 3: Study area segmentation
Select an appropriate kernel function and information criterion method according to Equations (4) and (5), respectively.
Calculate a GWR coefficient for each computing unit of each environmental factor according to Equations (1)–(3) by inputting the geographic coordinates of each center point and the values of all computing unit mentioned in Step 1.
Divide each environmental factor into N classes using the natural breaks method based on GWR coefficient values. In this work, M environmental factors, which are determined in Step 1, are chosen for study area segmentation. As a result, M classification maps are produced.
Superpose all the classification maps to obtain a superposed map and merge very small regions in this map to generate a final prediction region map according to Figure 6.
Step 4: The PSO-SVM prediction
To perform SVM prediction, training samples are constructed by using all the computing units with landslide and the same number of randomly selected computing units without landslide.
The two-class SVM classifier with the Gaussian RBF kernel is used for prediction. Then, perform the PSO algorithm to obtain the optimal C and γ for the SVM prediction model for each prediction region. Meanwhile, all the computing units are used for landslide susceptibility mapping according to Equation (8). In the resultant map, the probability values ranging from 0 to 100% are employed for representing different degrees of landslide susceptibility.
Merge the result of each prediction region. All of computing units in the prediction regions without landslide are assigned to zero. Eventually, the final landslide susceptibility map of the study area is produced.
Table 3. Classification of the bedding structure.
Table 3. Classification of the bedding structure.
TypeDefinition
Over-dip slope | α - β | [ 0 , 30 )   o r   | α - β | [ 330 , 360 ) , γ > 10   a n d   δ > γ
Under-dip slope | α - β | [ 0 , 30 )   o r   | α - β | [ 330 , 360 ) , γ > 10   a n d   δ < γ
Dip-oblique slope | α - β | [ 30 , 60 )   o r   | α - β | [ 300 , 330 )
Transverse slope | α - β | [ 60 , 120 )   o r   | α - β | [ 240 , 300 )
Anaclinal-oblique slope | α - β | [ 120 , 150 )   o r   | α - β | [ 210 , 240 )
Anaclinal slope | α - β | [ 150 , 210 )
α: Slope aspect; β: bed dip direction; γ: bed dip angle; δ: slope angle.
Table 4. Landslide environmental factors and their respective values.
Table 4. Landslide environmental factors and their respective values.
Environmental FactorsValue
GeomorphologyElevation (m)124.2727–922.3077
Slope angle (°)3.2045–36.2898
Slope aspect (°)28.4827–321.5051
Terrain surface convexity (°/100m)0.5979–0.2449
Plane curvature (°/100m)−0.4023–0.4832
Profile curvature (°/100m)−1.2441–1.2856
Slope form(1) V/V; (2) GE/V; (3) X/V; (4) V/GR; (5) GE/GR; (6) X/GR;
(7) V/X; (8) GE/X; (9) X/X
Slope height (m)374.6390–3.6325
Mid-slope position0.1272–0.9491
Terrain surface texture0.8495–0.3018
Terrain roughness index1.1589–16.4521
Terrain convergence index−27.6027–19.7669
Terrain curvature (°/100m)−1.5762–1.4682
Terrain position index−14.6285–9.5591
GeologyLithology(1) mudstone, shale and Quaternary deposits; (2) sandstones and thinly bedded limestones;
(3) limestones and massive sandstones
Bedding structure(1) over-dip slope; (2) under-dip slope; (3) dip-oblique slope;
(4) transverse slope; (5) anaclinal-oblique slope;
(6) anaclinal slope
HydrologyCatchment area (m2)1156.0378–105,783.4666
Catchment slope (°)0.0485–0.5675
Flow path length (m)50.1196–2352.5587
Valley depth (m)3.4642–258.2873
Stream power index−617,299.4571–281,486.9383
Distance from drainage (m)18.4328–5637.6471
Topographic wetness index8.2193–14.7816
Vertical distance to channel network (m)−184.3475–461.4196
Land coverLand-use(1) water; (2) residential; (3) forest; (4) agriculture;
(5) grassland
NDVI−0.4856–0.8337
NDWI0.0206–0.69411
MeteorologyPrecipitation (mm)1134.0551–1192.7400
GeophysicsMagnitude (Ms)1.2617–2.1209
Table 5. Correlations of geomorphological factors.
Table 5. Correlations of geomorphological factors.
Environmental FactorELESLANSLASSLHESLFOTSTTRITPITCIMSLPPLCUPRCUTCUTSC
ELE1
SLAN0.1981
SLAS0.022−0.0991
SLHE0.3210.5810.031
SLFO−0.0130.0930.160.2061
TST−0.255−0.7390.045−0.562−0.0221
TRI0.1880.995−0.1050.5790.091−0.7351
TPI0.1250.1380.1220.3380.761−0.0620.1331
TCI0.1170.0540.2210.2410.787−0.0130.0470.8101
MSLP0.080.0070.0150.176−0.163−0.1430.025−0.15−0.161
PLCU−0.1030.1120.1870.1620.735−0.0520.1140.6010.641−0.0931
PRCU−0.172−0.103−0.08−0.224−0.5640.017−0.095−0.809−0.6610.14−0.31
TCU0.0710.1310.1550.2430.782−0.040.1270.8890.804−0.150.728−0.8721
TSC0.0830.155−0.0150.3560.1690.1720.1420.2040.1650.0210.034−0.20.1611
ELE = elevation, SLAN = slope angle, SLAS = slope aspect, SLHE = slope height, SLFO = slope form, TST = terrain surface texture, TRI = terrain roughness index, TPI = terrain position index, TCI = terrain convergence index, MSLP = mid-slope position, PLCU = plane curvature, PRCU = profile curvature, TCU = terrain curvature, TSC = terrain surface convexity.
Table 6. Correlations of hydrological factors.
Table 6. Correlations of hydrological factors.
Environmental FactorDISDCMAFPLTWIVADECMSLSPIVDCN
DISD1
CMA0.0111
FPL−0.1090.5511
TWI−0.0260.6070.5451
VADE−0.1120.6780.6750.651
CMSL−0.0070.3270.410.4110.6381
SPI−0.055−0.0130.004−0.112−0.052−0.0041
VDCN−0.3680.2590.4240.2220.4750.2920.0451
DISD = distance from drainage, CMA = catchment area, FPL = flow path length, TWI = topographic wetness index, VADE = valley depth, CMSL = catchment slope, SPI = stream power index, VDCN = vertical distance to channel network.
Table 7. The numbers of the total slope-units and the landslide slope-units for each prediction region.
Table 7. The numbers of the total slope-units and the landslide slope-units for each prediction region.
Region IDNumber of Slope-UnitsNumber of Landslide Slope-UnitsRegion IDNumber of Slope-UnitsNumber of Landslide Slope-Units
1599187518
251519400
382206314
4590215212
5525225412
6170235213
76119245715
8610257124
9138292613460
1057927100
113812288036
128002990
13212307612
1490233170
15648324722
167714337031
17420343710
Table 8. The parameter settings of C and γ calculated by the PSO algorithm for the GWR-PSO-SVM model.
Table 8. The parameter settings of C and γ calculated by the PSO algorithm for the GWR-PSO-SVM model.
GWR-PSO-SVM ModelRegion IDCγRegion IDCγ
16.18260.13879205.94530.29134
21.29650.32455215.36590.38439
32.46820.31596223.35480.17105
51.48320.36957235.82340.36851
78.62350.51243242.16290.47592
94.13560.67572253.25920.45665
102.36590.49986266.53590.67853
112.69710.33645286.21570.47935
134.36510.42631307.28530.63428
145.86520.42375326.40753.35874
151.49640.56916335.33640.47516
164.75690.32793344.84350.67203
181.42590.47157-
Table 9. The training and verification sample of the five models.
Table 9. The training and verification sample of the five models.
ModelRegion IDTraining SampleVerification SampleRegion IDTraining SampleVerification Sample
GWR-PSO-SVM and GWR-SVM11859202863
21051212452
348222454
51052232652
73861243057
958138254871
10185726120134
112438287280
13421302476
144690324447
151664336270
162877342037
183675
SVM8321909
PSO-SVM8321909
RS-SVM8321909
Table 10. Overall accuracies by all the prediction models.
Table 10. Overall accuracies by all the prediction models.
ModelCorrectTotalAccuracy
SVM1415190974.12%
PSO-SVM1590190983.29%
RS-SVM1427190974.75%
GWR-SVM1140158471.97%
GWR-PSO-SVM1443158491.10%
Table 11. The AUC of four models.
Table 11. The AUC of four models.
ModelAreaStd. ErrorAsymptotic Sig.Asymptotic 95% Confidence Interval
Lower BoundUpper Bound
SVM0.8170.0110.0000.7960.837
PSO-SVM0.8690.0100.0000.8500.889
RS-SVM0.8250.0100.0000.8040.845
GWR-SVM0.8600.0090.0000.8420.878
GWR-PSO-SVM0.9710.0040.0000.9630.978
Std. = Standard; Sig. = Significant.

Share and Cite

MDPI and ACS Style

Yu, X.; Wang, Y.; Niu, R.; Hu, Y. A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China. Int. J. Environ. Res. Public Health 2016, 13, 487. https://doi.org/10.3390/ijerph13050487

AMA Style

Yu X, Wang Y, Niu R, Hu Y. A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China. International Journal of Environmental Research and Public Health. 2016; 13(5):487. https://doi.org/10.3390/ijerph13050487

Chicago/Turabian Style

Yu, Xianyu, Yi Wang, Ruiqing Niu, and Youjian Hu. 2016. "A Combination of Geographically Weighted Regression, Particle Swarm Optimization and Support Vector Machine for Landslide Susceptibility Mapping: A Case Study at Wanzhou in the Three Gorges Area, China" International Journal of Environmental Research and Public Health 13, no. 5: 487. https://doi.org/10.3390/ijerph13050487

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop