Introduction

Landslide disasters have caused devastating damage to the environment, life, and property (Hong et al. 2020). Robust and accurate displacement prediction is a key component of an early warning system. Recently, machine learning (ML) algorithms have become predominant approaches for landslide displacement prediction due to their capacity to model nonlinear complex processes. Among them, backpropagation (BP) neural networks have been extensively utilized due to their simple structure and acceptable accuracy (Du et al. 2013). In addition to BP, support vector machines (SVMs) (Liu et al. 2014), extreme learning machines (ELMs) (Cao et al. 2016), recurrent neural networks (Xing et al. 2020; Niu et al. 2021), and their variants (Ma et al. 2020b) have also been utilized for landslide displacement prediction.

Hyperparameter tuning is a crucial step for accurate and reliable ML (Yang and Shami 2020; Zhang et al. 2020c). However, hyperparameter tuning in ML-based prediction models is usually based on trial and error. Recently, as summarized in Table 7 in the Appendix, modern metaheuristic algorithms have been extensively utilized for hyperparameter optimization in ML-based landslide displacement prediction. As shown, metaheuristic algorithms, including artificial bee colony (ABC) optimization algorithms (Zhou et al. 2018a, genetic algorithms (GAs) (Li and Kong 2014; Cai et al. 2016; Miao et al. 2017), gray wolf optimization (GWO) algorithms (Guo et al. 2020; Liao et al. 2020), particle swarm optimization (PSO) algorithms (Zhou et al. 2016; Zhang et al. 2020b), and water cycle algorithms (WCAs) (Zhang et al. 2021b), have been combined with ML algorithms and extensively studied for landslide displacement prediction. As shown in Table 7 in the Appendix, the performance of hybrid metaheuristics and ML approaches has been proven to be competitive. In particular, support vector regressions (SVRs), i.e., the use of SVM for regression, have been extensively integrated with metaheuristics for landslide displacement prediction.

Despite their extensive application, these algorithms suffer from poor reproducibility across replicate cases (Ma and Mei 2021). As listed in Table 7 in the Appendix, in previous performance comparisons, only the deterministic optimal estimation was considered, and only a single-run comparison was conducted. However, due to the inherent stochastic nature of these algorithms (Gao et al. 2020; Ahmed et al. 2021), the same metaheuristic algorithm may yield different optimal solutions in multiple runs (Ahmed et al. 2021). The solutions in even superior models deviate strongly for a given case, which means that ideal results from a single run are hard to replicate on similar cases. For example, PSO-optimized SVR (PSO-SVR) was found to be superior to GA-optimized SVR (GA-SVR) (Zhou et al. 2016). However, completely opposite results were achieved in the research of Miao et al. (2017), which raises questions concerning the repeatability of trained models based on a single run. A systematic comparison of benchmark cases and a presentation of the statistical significance are recommended to increase the repeatability (Ma and Mei 2021).

In the present study, a hybrid approach integrating k-fold cross-validation (CV), metaheuristic SVR, and the nonparametric Friedman test is proposed to enhance reproducibility by presenting the statistical significance. Observations from the Shuping and Baishuihe landslides in the Three Gorges Reservoir area (TGRA) are selected as benchmark datasets for the comprehensive comparison of SVRs optimized by metaheuristics, including ABC, GA, GWO, PSO, and WCA. Nonparametric Friedman tests are performed to reveal significant differences and to rank the five metaheuristics.

Methodology

SVR

SVM, which was proposed by Cortes and Vapnik (1995), is considered a powerful and robust ML algorithm for classification and regression (Raghavendra and Deka, 2014; Malik et al. 2020). SVR is a regression approach based on an SVM. For a set of landslide monitoring data \(\{ x_{i} ,y_{i} \}_{i}^{n}\), nonlinear SVR with a kernel function \(K(x_{i} ,x)\) is formulated as follows:

$$y = f(x) = \sum\limits_{i = 1}^{n} {\omega_{i} K(x_{i} ,x) + b}$$
(1)

where \(\omega\) and b are the weight vector and bias, respectively.

A nonlinear SVR form can be obtained by the following optimization problem:

$$\mathop {\min }\limits_{\omega ,b} \frac{1}{2}\left\| \omega \right\|^{2} + C\sum\limits_{t = 1}^{T} {\left| {\xi_{t} + \xi_{t}^{*} } \right|}$$
(2)

where \(C\), \(\xi_{t}\), and \({\xi }_{t}^{*}\) are the penalty parameter and two slack variables, respectively.

By introducing the Lagrange multipliers \(a_{i}^{*}\) and \(a_{i}\), nonlinear SVR can be converted to a dual problem and expressed as follows:

$$y = f(x) = \sum\limits_{i = 1}^{n} {(a_{i}^{*} - a_{i} )} K(x_{i} ,x) + b$$
(3)

Various kernels, including linear, polynomial, Gaussian, and sigmoid kernels, have been proposed. Previous kernel research (Ahmadi et al. 2015; Karasu et al. 2020) has already indicated that the Gaussian kernel can be safely applied as it provides accurate results. Thus, the most widely applied Gaussian kernel is adopted, which is expressed as follows:

$$K(x,x_{i} ) = \exp [ - \frac{{(x - x_{i} )^{2} }}{{2\sigma^{2} }}]$$
(4)

where \(\sigma\) is the width of the Gaussian kernel.

Gaussian kernel SVR is sensitive to the hyperparameters C and \(\sigma\). In the present study, five metaheuristic algorithms, including ABC, GA, GWO, PSO, and WCA, are applied for hyperparameter tuning (Fig. 1a).

Fig. 1
figure 1

Flowchart of enhancing the ML-based prediction model of landslide displacement using CV-metaheuristic-SVR and the nonparametric Friedman test

Metaheuristic algorithms for hyperparameter optimization of SVR

Metaheuristic algorithms are often nature-inspired computational intelligence methods for optimal solution approximation (Khan et al. 2021). Recently, various metaheuristics, such as ABC, GA, GWO, PSO, and WCA, have been widely utilized for landslide displacement prediction due to their optimization strengths. The main characteristics of these algorithms are listed in Table 8 in the Appendix. As shown in this appendix, the optimal processes usually start with a random generation of possible solutions called a population. Then, the generated population is randomly and iteratively updated (i.e., the exploration and exploitation phases) until the predetermined criteria are met. Exploration and exploitation refer to encountering new regions and searching within the corresponding neighborhood, respectively (Morales-Castañeda et al. 2020). The stochastic nature of metaheuristic algorithms makes it necessary to implement multiple runs (Eskandar et al. 2012; Babaoglu 2015; Bahreininejad 2019; Wang et al. 2019a; Abderazek et al. 2020). Thus, in the present study, the same metaheuristics were independently run 100 times.

  1. 1.

    ABC

    ABC, a swarm-based metaheuristic algorithm, emulates the foraging behavior of bees for optimization (Karaboga and Basturk 2007). A typical ABC consists of two main components, a food source and a bee colony, which consists of employed, onlooker, and scout bees. The position of a food source represents a possible solution. Recently, ABC optimization has been successfully applied for landslide displacement prediction (Zhou et al. 2018a; Zhang et al. 2021a). The main procedure of ABC is listed in Table 8 in the Appendix.

  2. 2.

    GA

    As the name implies, the GA concept is inspired by the evolution process and mainly involves crossover and mutation. GAs have been extensively used for landslide displacement prediction (Li and Kong 2014; Cai et al. 2016; Miao et al. 2017; Zhu et al. 2017). The main procedure of the GA is listed in Table 8 in the Appendix.

  3. 3.

    GWO

    GWO, a new swarm-based metaheuristic algorithm, mimics the hunting behavior of gray wolves (Mirjalili et al. 2014). The position of a gray wolf represents a possible solution. A gray wolf group consists of alpha, beta, delta, and omega wolves, which represent the best, second-best, third-best, and remaining solutions, respectively. The positions are simultaneously updated based on the three best solutions. The main procedure of GWO is listed in Table 8 in the Appendix.

  4. 4.

    PSO

    PSO is a swarm-based metaheuristic algorithm that simulates the social behavior of bird flocking (Kennedy and Eberhart 1995) and has gained substantial attention for landslide displacement prediction. The particle position, which represents a possible solution, is updated based on the individual and global optima. The main PSO procedure is listed in Table 8 in the Appendix.

  5. 5.

    WCA

    WCA is a novel physical-based metaheuristic algorithm that simulates the water cycle process (Eskandar et al. 2012), in which water flows into the sea after water from precipitation, streams, and rivers is combined. WCA starts with a random generation of raindrops that represent possible solutions. In addition, the best individual is chosen as the sea. The main WCA procedure is listed in Table 8 in the Appendix.

  6. 6.

    SVR optimized by metaheuristic techniques

    The main procedures of SVR optimized by metaheuristic techniques are as follows: first, the landslide observation is divided into training and test datasets. Second, parameters such as population size and the maximum number of iterations are initiated, and possible solutions consisting of hyperparameters C and \(\sigma\) are generated for training SVR. Third, the fitness values of the trained SVR are calculated and evaluated. Fourth, the hyperparameters of SVR are randomly and iteratively updated according to the updating strategy until the predetermined criteria are met. If the predetermined criteria are satisfied, the best hyperparameters are output as the optimal SVR.

k-fold CV

k-fold CV is the most popular approach for validation as it can mitigate overfitting (Chou and Thedja 2016). In the k-fold CV approach, the original training set is randomly divided into k subdatasets. A new training dataset is formed based on k-1 subdatasets. The remaining dataset is adopted as the validation set. A model is trained based on the newly formed training dataset and evaluated on the validation set. The performance measure from the first round is computed. The above processes are repeated k times. The performance measure from k-fold CV is the average value computed in the loop (schematically illustrated in Fig. 1).

Evaluation criteria

The representative equations, features, and characteristics of common statistical indices (e.g., the mean absolute error (MAE), root mean square error (RMSE), and correlation coefficient (R)) are summarized in Table 9 in the Appendix. Previous studies (Yang et al. 2020) have shown that the utilization of square values can enhance the evaluation of model performance. Therefore, the evaluation criteria, including the RMSE and Kling-Gupta efficiency (KGE) from 100 runs, were obtained and applied to compare model performance.

Nonparametric Friedman test

In the present study, the aim of the nonparametric Friedman test is to present significant differences among the five metaheuristic algorithms to increase the repeatability. The steps in the nonparametric Friedman test are mainly summarized as follows (Ganaie and Tanveer 2020; Banaie-Dezfouli et al. 2021):

  1. 1.

    Gather evaluation criteria for each metaheuristic algorithm over 100 runs.

  2. 2.

    For the ith run, the tested metaheuristic algorithms are ranked from best to worst as 1 to k, which is denoted as \(r_{i}^{j}\).

  3. 3.

    For the jth algorithm, average the obtained ranks over 100 runs: \(R_{j} = \frac{1}{n}\sum\limits_{i}^{j} {r_{i}^{j} }\).

  4. 4.

    The nonparametric Friedman statistic \(F_{f}\) is expressed as follows:

    $$F_{f} = \frac{12n}{{k(k + 1)}}[\sum\limits_{j} {R_{j}^{2} } - \frac{{k(k + 1)^{2} }}{4}]$$
    (5)

In the nonparametric test, a p value is used to determine the probability of rejecting the null hypothesis. A p value < 0.05 indicates that the null hypothesis should be rejected, which reveals a statistically significant difference among the tested metaheuristic algorithms (Korkmaz et al. 2021).

CV-metaheuristic-SVR and nonparametric Friedman test for enhancing the ML model

The main steps of CV-metaheuristic-SVR and the nonparametric Friedman test for enhancing ML (illustrated in Fig. 1) are as follows:

  1. 1.

    Data preparation: Based on previous studies listed in Table 7 in the Appendix (Zhou et al. 2016; Ma et al. 2018, 2020a), the widely applied inputs, including accumulated precipitation in the current month and over the past 2 months (× 1 and × 2, respectively), average reservoir level in the current month (× 3), variation in the reservoir level in the current month (× 4), and displacement in the past 1, 2, and 3 months (× 5, × 6, and × 7, respectively), were selected as candidate input pools. The key variables with a maximum information coefficient (MIC) greater than 0.3 (Wang et al. 2019b, 2021) were adopted to remove redundant and irrelevant variables from the candidate pool (Ma et al. 2022). The ratio of training to testing data was set as 80 to 20%, respectively.

  2. 2.

    k-fold cross-validation: Based on previous studies of k-fold cross-validation in geohazards (Ghorbanzadeh et al. 2020; Meena et al. 2021), fourfold CV was adopted in the present study.

  3. 3.

    Parameter initialization: The parameters were initiated, and possible solutions consisting of the hyperparameters C and \(\sigma\) were generated. The search ranges for the penalty factor and width of the Gaussian kernel were set to [0, 100] and [0, 100], respectively (Miao et al. 2017). For the metaheuristics compared in the present study, the population size and the maximum number of iterations were set to 50 and 200, respectively. For ABC, the percentages of onlooker and employed bees were each 50%. In addition, the number of scout bees was set to one. For GA, the crossover and mutation probabilities were set to 0.85 and 0.05, respectively. For PSO, the inertia weight was set to linearly decrease from 0.9 to 0.4. Two coefficient values were both set to 2 (Ahmed et al. 2021). For WCA, the total number of rivers and seas and the maximum allowable distance between the river and sea were set to 10 and 1e-3 (Eskandar et al. 2012; Zhang et al. 2021b), respectively.

  4. 4.

    Fitness evaluation: The average value of the normalized mean square error (NMSE) from fourfold CV was adopted as the fitness and evaluated before the optimization process started.

  5. 5.

    Parameter updating: The hyperparameters C and \(\sigma\) were iteratively updated with for ABC, GA, GWO, PSO, and WCA methods until the predetermined stopping criteria were met. The best hyperparameters C and \(\sigma\) were output for optimal SVR modeling. Considering the inherent stochastic nature of these methods, the metaheuristic-based SVRs were independently run 100 times. The metaheuristic-based SVR methods were implemented using Python 3.8 in the Windows Subsystem for Linux (WSL) with Ubuntu 20.04 with an Intel Core i9-10900 K@3.7 GHz and 64 GB of RAM.

  6. 6.

    Nonparametric Friedman test: The RMSE, KGE, and computational time for each run were recorded. Nonparametric Friedman tests were performed based on the obtained RMSEs, KGEs, and computational times.

Case study 1: Shuping landslide

Feathers of the Shuping landslide

The Shuping landslide, an ancient landslide, is situated in Zigui County, Yichang, TGRA, China (Figs. 2 and 3); this landslide has a length of 800 m, width of 700 m, and average thickness of 50 m. The landslide volume is approximately 27.5 million m3. The elevations of the landslide toe and crown are 60 and 400 m, respectively. The field investigation and borehole drilling show that the landslide materials are silty clay with gravel clasts underlaid by marlstone and siltstone of the Triassic Badong Formation (Fig. 3c). A monitoring system consisting of a GPS and an inclinometer was installed for landslide monitoring (see Fig. 3b for the GPS and inclinometer locations). The sliding surface was observed at depths of 70 and 30 m from inclinometers QZK3 and QZK4, respectively. These results correspond well with the borehole data.

Fig. 2
figure 2

Location of the case studies (marked with a red star) in the TGRA (marked in gray), China

Fig. 3
figure 3

a Photograph, b 3D topographic map with instrumentation, and c geological profile of the Shuping landslide, TGRA. The inset graph in (c) shows lateral displacements from inclinometers QZK3 and QZK4

The Shuping landslide has been widely utilized as a case study for landslide displacement prediction (Ren et al. 2014; Wen et al. 2017; Ma et al. 2018; Zhou et al. 2018a; Wang et al. 2019b). The widely applied monitoring data from ZG88, the rainfall intensity, and the reservoir level from January 2007 to December 2012 (Fig. 4) indicate step-like movement patterns. Further details of the geological setting and deformation characteristics were provided in previous research by Ma et al. (2018).

Fig. 4
figure 4

Observations of landslide displacement at ZG88, the reservoir level, and the rainfall intensity in the Shuping landslide area from January 2007 to December 2012

Input variable selection

The pairwise correlations of the landslide displacement at ZG88 with candidate variables are shown in Fig. 5. As shown, the MICs of all candidate variables with landslide displacement are greater than 0.3. Moreover, the strongest correlation was observed between the average reservoir water level and landslide displacement, followed by the correlation between the variation in the reservoir level and landslide displacement. These findings correspond well with previous research (Wang et al. 2022). Therefore, the key variables, including rainfall (× 1 and × 2), reservoir water level (× 3), variation in the reservoir level (× 4), and evolution state (× 5, × 6, and × 7), were set as the final inputs for model training.

Fig. 5
figure 5

Scatter matrix showing the pairwise correlations of the landslide displacement at ZG88 (y) with rainfall (× 1 and × 2), reservoir water level (× 3), variation in the reservoir level (× 4), and evolution state (× 5, × 6, and × 7). The panels in the lower left panels show the MIC, and the upper right half shows the corresponding data points

Results comparison

Comparison of single predictions

The predictions from 100 separate runs and their corresponding mean values from metaheuristic-based SVR methods for the testing data are shown in Fig. 6a–f. Clearly, as shown in Fig. 6, the same metaheuristics yield different results for multiple runs due to their inherent stochastic nature. Attentional biases were observed among 100 separate runs. The statistics for the 100 runs listed in Table 1 show that for the best single prediction, by using the RMSE criterion, GA provides the best prediction with the lowest RMSE. WCA yields the worst results. However, considering the KGE criterion, WCA outperforms the rest of the metaheuristic methods. As shown in Fig. 6 and Table 4, in terms of the RMSE and KGE criteria, the mean prediction from GA outperforms the other metaheuristic methods.

Fig. 6
figure 6

ae Predictions of landslide displacement for ZG88 by a ABC-SVR, b GA-SVR, c GWO-SVR, d PSO-SVR, and e WCA-SVR on the test dataset; f comparison of mean prediction from the metaheuristic-based SVR methods

Table 1 Comparison of the performance of the metaheuristic-based SVR methods for the Shuping landslide data

In summary, based on a single prediction, there is no guarantee for identifying one method as the best for the displacement prediction of the Shuping landslide, and further evaluations are needed.

Nonparametric statistical analysis

The Friedman test results for the metaheuristic-based SVR methods are listed in Table 1. As shown in this table, the p values for the Friedman tests of RMSE, KGE, and computational time are 5.53 × 10−49, 7.09 × 10−49, and 2.62 × 10−66, respectively. These results clearly demonstrate that for the five compared metaheuristic methods, there are significant differences in terms of precision and computational time. The corresponding rankings are depicted in Table 1. As shown in this table, the rankings based on the Ff of the RMSE and KGE criteria exhibit the same pattern. GA and PSO ranked first and second, respectively, and WCA ranked last. The low rank of WCA may be due to trapping at local optima, which leads to premature convergence.

In summary, inconsistency from single-run comparisons has been addressed by the nonparametric Friedman test. Significant performance differences were revealed among the metaheuristic methods. GA achieves superior performance.

For the computational time, the metaheuristic-based SVRs ranked from fastest to slowest as follows: WCA, PSO, GA, ABC, and GWO. These results indicate that WCA is capable of finding the optimal result at the lowest computational cost. Both ABC and GWO are computationally demanding.

Sensitivity analysis

Model stability is another essential factor that should be considered in model comparison. The evaluation metrics (RMSE, KGE, and computational time) from 100 runs of the metaheuristic-based SVR methods are presented in Fig. 7. The metaheuristic-based SVR methods and corresponding evaluation metrics are shown on the vertical and horizontal axes, respectively. The statistical results, including the 10th and 90th percentile values and mean values, are shown with boxes and red lines, respectively. As shown, the WCA- and GA-based SVR methods provide significantly different results when run multiple times, which indicates that those two algorithms suffer from instability. It is evident that the evaluation metrics from the PSO-, ABC-, and GWO-based SVR methods over 100 runs exhibit narrow ranges of RMSE and KGE values. The predictions from the PSO-, ABC-, and GWO-based SVR methods shown in Fig. 6 are generally concentrated around the observations, indicating stable performance. However, WCA suffers from serious robustness issues, as further confirmed its standard deviation, which was the largest among all methods (listed in Table 1). This result is mainly due to the unsatisfactory balance between exploitation and exploration, which leads to trapping at local optima and premature convergence. In fact, the exploration phase may not play a role in determining the final solution (Xu and Mei 2018; Nasir et al. 2020), which increases the burden of exploration.

Fig. 7
figure 7

Comparison of metaheuristic-based SVR methods for ZG88 in terms of the a RMSE, b KGE, and c computational time

Convergence analysis

The convergence fitness from the best runs (i.e., the lowest NMSE) and mean fitness value from 100 runs of different metaheuristic methods are shown in Fig. 8. The convergence curves display the following trends.

Fig. 8
figure 8

Comparison of the optimal and mean fitness values for ZG88 of different metaheuristic methods

The convergence curve of the mean fitness value of GA remains far from the horizontal axis, which indicates that information carriers are still far from each other until the optimization process ends. This result is mainly caused by the poor local search capability of GAs (Belhaiza et al. 2019). The convergence curves of the swarm-based algorithms, including ABC, PSO, and GWO, reach near-optimal solutions after 120 iterations, which reflects premature convergence, as noted in previous research (Malik et al. 2015; Yang et al. 2020). WCA can converge to the optimal solution soonest based on the initial iterative process.

Furthermore, the prediction models with the integration of CV and metaheuristic-based SVR were compared with existing models on the Shuping landslide (Table 2). As shown, the models based on CV-metaheuristic-SVR provide the best prediction with the largest R and lowest RMSE. These comparative results clearly indicate that CV and metaheuristic SVR can be employed to improve model performance by determining the optimal hyperparameters.

Table 2 Performance comparison of various prediction models for the Shuping landslide data

Case study 2: Baishuihe landslide

Feathers of the Baishuihe landslide

The Baishuihe landslide (Fig. 9), an ancient landslide, is situated on the south bank of the Yangtze River (see Fig. 2 for the location of this landslide). The Baishuihe landslide has an estimated volume of 12.6 million m3, with an average thickness of 30 m. The landslide covers an area of 0.42 km2, with a length of 600 m and a width of 700 m (Fig. 9). The landslide encompasses an active block and a relatively stable block (Fig. 9a–b). The field investigation and borehole drilling show that the landslide materials are silty clay with gravel clasts (Fig. 9c). A monitoring system consisting of a GPS and an inclinometer was installed (see Fig. 9b–c) for locations of the GPS and inclinometer). The observed lateral displacement from ZK05 indicates shallow and deep sliding surfaces at depths of 13 and 23 m.

Fig. 9
figure 9

a Photograph, b 3D topographic map with instrumentation, and c geological profile of the Baishuihe landslide, TGRA. The inset graph in (c) shows lateral displacements from inclinometer ZK05

The Baishuihe landslide has been widely selected as a case for landslide displacement prediction (Miao et al. 2017; Zhou et al. 2018b; Ma et al. 2022; Wang et al. 2022). In the present study, the widely applied monitoring data for XD01 were selected for training the landslide displacement model. The cumulative displacement of XD-01, the reservoir level, and the rainfall intensity in the Baishuihe landslide area from January 2007 to December 2011 are shown in Fig. 10. The landslide displacement is characterized by step-like movement patterns.

Fig. 10
figure 10

Observations of landslide displacement at XD01, the reservoir level, and the rainfall intensity in the Baishuihe landslide area from January 2007 to December 2011

Input variable selection

The pairwise correlations of the landslide displacement at XD01 with candidate variables are shown in Fig. 11. As shown in this figure, the MICs of all candidate variables with landslide displacement at XD01 are greater than 0.3. Moreover, the strongest correlation (i.e., a displacement greater than 0.6) was observed between the variation in the reservoir level and landslide displacement. These findings correspond well with current research, which has indicated that the movement of XD01 is more sensitive to variations in the reservoir (Miao et al. 2017; Ma et al. 2022). Therefore, the key variables, including rainfall (× 1 and × 2), reservoir water level (× 3), variation in the reservoir level (× 4), and evolution state (× 5, × 6, and × 7), were set as the final inputs for model training.

Fig. 11
figure 11

Scatter matrix showing the pairwise correlations of the landslide displacement at XD01 (y) with rainfall (× 1 and × 2), reservoir water level (× 3), variation in the reservoir level (× 4), and evolution state (× 5, × 6, and × 7). The panels in the lower left panels show the MIC, and the upper right half shows the corresponding data points

Results comparison

Comparison of single predictions

The prediction for the test dataset from 100 runs is shown in Fig. 12a–e. The average values from 100 runs were computed and are shown in Fig. 12f. As shown in this figure, due to their inherent stochastic nature, different predictions with attentional biases were observed among 100 separate runs. According to the statistics for the 100 runs, the following results can be obtained:

Fig. 12
figure 12

ae Predictions of landslide displacement for XD01 by a ABC-SVR, b GA-SVR, c GWO-SVR, d PSO-SVR, and e WCA-SVR on the test dataset; f comparison of mean prediction from the metaheuristic-based SVR methods

For the best prediction, by using the RMSE criterion, WCA provides the best prediction with the lowest RMSE. GA outperforms the rest of the metaheuristics when considering the KGE criterion.

For mean prediction, in terms of the RMSE criterion, the mean prediction using WCA outperforms the rest of the metaheuristic methods. In terms of the KGE criterion, GA provides the best mean prediction. The performance rankings are different from those of the Shuping landslide.

In summary, the performance ranking from a single run highly was dependent on the selected evaluation criteria and case. There is no guarantee that one algorithm will outperform all others in all cases. Further evaluations among the five metaheuristic methods are needed.

Nonparametric statistical analysis

Model ranks of the metaheuristic-based SVR methods using Friedman test results are listed in Table 3. p values much lower than 0.05 were obtained, which clearly indicates significant differences in terms of precision and computational time. The rankings based on Ff are listed in Table 3. As shown in this table, based on the Ff of the KGE and RMSE criteria, the compared models are ranked as follows: GA, WCA, PSO, ABC, and GWO. Although some differences in model rankings were observed with the Shuping landslide, GA ranks first for both cases. WCA is the most effective method for both cases.

Table 3 Comparison of the performance of the metaheuristic-based SVR methods for the Baishuihe landslide data
Table 4 Performance comparison of various prediction models for the Baishuihe landslide data

Sensitivity analysis

As shown in Table 3 and Fig. 13, predictions with significant bias were provided by the WCA and GA-based SVR methods during multiple runs with a wider range of RMSE and KGE and a larger value of the standard deviation. These results demonstrate the poor stability of WCA- and GA-based SVRs. In particular, WCA suffers from the most serious robustness issues with the widest range of RMSE and KGE and the largest standard deviation. PSO-, ABC-, and GWO-based SVRs achieve better stability during 100 runs with narrow ranges of RMSE and KGE values and lower standard deviations. Among them, the PSO-based SVR is the most stable with the lowest standard deviation (Table 3).

Fig. 13
figure 13

Comparison of metaheuristic-based SVR methods for XD01 in terms of the a RMSE, b KGE, and c computational time

Convergence analysis

The following trends were observed from the optimal and mean fitness values shown in Fig. 14: the mean fitness value from GA remained far from the horizontal axis until the optimization process ended. The optimal fitness value from WCA converged to the optimal solution soonest (after 80 iterations). Equal fitness values were reached among the swarm-based algorithms, including the ABC, PSO, and GWO algorithms.

Fig. 14
figure 14

Comparison of the optimal and mean fitness values for XD01 of different metaheuristic methods

The prediction from the present research has been further compared with various prediction models for the Baishuihe landslide. It was shown that the hybrid approach integrating CV and metaheuristic-based SVR had the largest R, outperforming those methods reported in previous research.

In summary, based on a single-run comparison, the performance ranking of metaheuristic optimized SVRs was highly dependent on the selected evaluation criteria and case. WCA-SVR achieved the best single prediction, while GA-SVR provided superior mean prediction. Based on Friedman tests of the KGE and RMSE criteria, GA ranks first for both the Shuping and Baishuihe landslides with its superior performance. The Friedman test of computational time demonstrates that WCA is the most effective method as it is capable of finding the optimal solution soonest. The best stability was achieved from PSO-based SVR. Such findings prove that the hybrid approach based on PSO and SVR is a promising tool for predicting landslide displacement with a high level of precision, speed convergence, and stability.

Discussion

In summary, metaheuristic methods can provide satisfactory predictions. Based on a single-run comparison, the performance ranking was highly dependent on the selected evaluation criteria and case. Based on the Friedman tests of RMSE, KGE, and computational time from multiple runs, significant differences were observed. The experimental results for the Shuping and Baishuihe landslide data indicate that GA and PSO are capable of providing reliable predictions with high precision. In terms of computational time, WCA and PSO are effective. In addition, PSO and ABC exhibit good robustness. Moreover, compared with evolution-based algorithms such as GA, swarm-based algorithms have fewer parameters and do not require crossover and mutation probabilities (Abderazek et al. 2020). In summary, PSO is competitive in terms of precision, computational time, and robustness.

In the present study, the Gaussian kernel was chosen based on previous recommendations. Furthermore, the performance comparison of PSO-SVR among different kernel types was constructed. The evaluation criteria of ZG88 and XD01 were computed and are listed in Table 5. As shown in this table, PSO-SVR with a Gaussian kernel provides the best performance with the lowest RMSE and highest KGE for both ZG88 and XD01. These results correspond with previous findings, which reveal that the Gaussian kernel can be safely applied as it provides accurate results (Ahmadi et al. 2015; Karasu et al. 2020). PSO-SVR with a polynomial kernel is computationally demanding, while PSO-SVR with a sigmoid kernel is the most effective, followed by the Gaussian kernel.

Table 5 Performance comparison for PSO-SVR with different kernel types for displacement prediction of ZG88 and XD01

The strengths and weaknesses of the compared metaheuristic methods for landslide displacement prediction are summarized in Table 6. However, as stated in the “no free lunch” theorem (Wolpert and Macready 1997), although one algorithm may perform best for a specific problem, it may not perform best for other types of problems. Therefore, it is worth noting that the rankings obtained in the present study are only valid for a specific set of algorithms for landslide displacement prediction. For other sets of metaheuristic methods, the rankings would be significantly different. In different scenarios, it is recommended to run the nonparametric Friedman test.

Table 6 Summary of the strengths and weaknesses of the metaheuristic methods considered for landslide displacement prediction

Conclusion

In the present study, a hybrid approach integrating the k-fold CV, metaheuristic SVR, and nonparametric Friedman test was proposed to enhance reproducibility by presenting the statistical significance. Five metaheuristic methods, including ABC, GA, GWO, PSO, and WCA, were utilized for hyperparameter optimization in SVR for displacement prediction and compared on the benchmark datasets from the Shuping and Baishuihe landslides. Nonparametric Friedman tests were performed to reveal significant differences. The following conclusions were obtained:

Based on a single-run comparison, the performance ranking was highly dependent on the selected evaluation criteria and case.

The hybrid approach based on the k-fold CV, metaheuristic SVR, and nonparametric Friedman test can be employed to enhance accuracy and reliability in ML-based prediction by tuning the optimum hyperparameters and presenting the statistical significance. The p values of nonparametric Friedman tests confirmed the existence of significant differences in terms of precision and computational time. GA is best for landslide displacement prediction in terms of precision, and WCA is the most effective algorithm in terms of computational time but suffers from serious robustness issues. PSO can maintain a balance between the precision, computational time, and robustness.

The nonparametric Friedman test can serve as a useful basis for presenting the statistical significance comparison of metaheuristic algorithms. Notably, the rankings may also be suitable for displacement prediction for landslides with step-like movement patterns in the TGRA based on the specific set of algorithms considered. Thus, for different scenarios, the nonparametric Friedman test is recommended.