Skip to main content
Top
Published in:

Open Access 01-10-2024 | Original Paper

Travel distance estimation of landslide-induced debris flows by machine learning method in Nepal Himalaya after the Gorkha earthquake

Authors: Chenchen Qiu, Xueyu Geng

Published in: Bulletin of Engineering Geology and the Environment | Issue 10/2024

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Debris flows are more likely to be triggered in the earthquake-strike areas with a widespread presence of unstable slopes, causing severe casualties and changing the surrounding natural topography. In such scenario, estimating the travel distance of debris flows becomes crucial to understand the hazardous areas. Therefore, a hybrid machine learning model (GA-XGBoost) was employed to achieve a reliable estimation of debris-flow travel distance. This model was applied to the Nepal Himalayas, the site of the 2015 Gorkha earthquake. We selected four geomorphological factors for travel distance estimation. They are the volume of failure mass (VL), the height difference between the material source center and end point of movement mass (H), the mean gradient of the travel path (J), and the mean curvature of the travel path (C). Furthermore, to eliminate the noise information and enhance stability of input data, a principal component analysis (PCA) was used to generate three principal components (PC1, PC2, and PC3) from the selected factors to serve as input variables of model development. The performance of this model was evaluated using the assessment indexes, resulting in a mean absolute percentage error (MAPE) of 8.71%, a root mean square error (RMSE) of 144.3 m, and a mean absolute error (MAE) of 86.1 m. Four empirical approaches were also introduced for comparison analysis. Our proposed model has proven to be superior and effective, as the estimated results closely match the actual values. All the results affirm the suitability of our developed model for estimating the travel distance of landslide-induced debris flows following a strong earthquake.
Notes

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Introduction

Debris flow defined as debris grains with slurry (Phillips and Davies 1991) can continue to pose a significant threat in mountainous areas long after the occurrence of strong earthquakes (Shieh et al. 2009) since the affected areas are highly susceptible to debris flows in the first 5–10 years following such seismic events (Tang et al. 2009). This is because the ground shaking could weaken the stability of slopes (Fan et al. 2018) and increase their possibility to collapse (Lv et al. 2017). Furthermore, the unstable mass on slopes may start to move during the rainy season, potentially transitioning into unsaturated/saturated flows (Tang et al. 2015) or temporarily accumulating at the base of slopes, feeding the potential debris flows (Dahlquist and West 2019). As a result, the magnitude of debris flows can reach 105 m3 or even greater due to the entrainment of large quantities of materials during the flowing process (Crosta et al. 2003), finally leading to a long travel distance and causing severe losses of properties and human lives (Dahlquist and West 2019; Qiu et al. 2024b. Therefore, estimating the distance of debris flows in the aftermath of earthquakes on a regional scale is of paramount importance. It helps in identifying high-risk areas and developing effective mitigation strategies (Cascini et al. 2014; Corominas et al. 2014; Paudel et al. 2020; Zhang et al. 2013; Zhou et al. 2019).
In general, the travel distance estimation of debris flows induced by landslides includes two primary approaches: empirical methods and numerical modelling. The former can be dated back to the work of Scheidegger (1973), which enabled the ratio of the runout distance (L) to height difference (H) relating to the reach angle α. Subsequent studies by Corominas (1996) and Rickenmann (1999) incorporated deposit volume into a relationship with H/L, making it a significant indicator in estimating the travel distance (Lorente et al. 2003; Devoli et al. 2009; Hürlimann et al. 2015). However, these empirical methods have limitations as they struggle to adequately account for the complexity of debris-flow processes, which are influenced by a range of geomorphic, environmental and geological factors (Regmi et al. 2010). Therefore, an appropriate combination of disposing factors is essential to reveal the travel characteristics of debris flows. In addition to the empirical methods, numerical modelling can provide relatively accurate estimations when using parameters obtained through controlled laboratory experiments, but this method itself is time-consuming and costly for regional application (Paudel et al. 2021; Qiu et al. 2022). Therefore, a new model for the travel analysis of landslide-induced debris flows after the earthquake is necessary to form effective debris-flow mitigation strategies. In this context, machine learning is introduced here due to its strong ability to train complex and hidden relationships between a set of input variables and output results (Khosravi et al. 2021). The applications of the machine learning models have demonstrated their superiority in hazard analysis (Lin et al. 2017; Rahmati et al. 2017). In this paper, an ensemble method (XGBoost) is employed for travel distance estimation. However, parameter optimization is a time-consuming process. Therefore, a genetic algorithm (GA) is used to generate optimal hyperparameters, allowing for the development of an optimal estimation model. Finally, a hybrid machine learning model is developed to estimate the travel distance of landslide-induced debris flows following an earthquake.
In this paper, a hybrid machine learning model is developed to estimate the travel distance of debris flows induced by landslides after the Mw 7.8 Nepal earthquake that occurred on 25 April 2015. The study area is characterized by its challenging topography, with rugged terrain and steep slopes. It experiences recurring debris flows each year, primarily attributed to active geological activities and abundant rainfall. Overall, our work aims to enhance understanding of landslide-induced debris flows after the earthquake and benefit the planning of mitigation strategies.

Study area

Our study area is located in the Nepal Himalaya, and the identified debris flows are distributed in the northern part of Nepal with an altitude ranging from 947 m to 2966 m above sea level (Fig. 1).
The diverse topography of Nepal gives rise to varying weather patterns, with the northern part predominantly occupying the Himalayan mountains and the southern part consisting of flat terrain. Nepal experiences a typical monsoon climate characterized by two distinct seasons each year: a dry season from October to March of the following year and a rainy season from April to September. As indicated in Fig. 1, the debris flows in this paper are primarily concentrated in the 14 districts of northern Nepal. To depict the weather conditions in these areas, the 14 districts are highlighted to illustrate precipitation distribution. The distribution of the mean annual precipitation from 1980 to 2015 is shown in Fig. 2. District 1 recorded the highest mean annual rainfall at 4950 mm, followed by District 7 with the second-highest precipitation at 4050 mm (Fig. 3). Meanwhile, District 3 experienced the lowest mean annual precipitation at 950 mm. This distribution pattern aligns with the findings of Du et al. (2020) and Li et al. (2011), which indicate that precipitation exhibits a negative correlation with altitude in the Himalayan regions.
Cold weather is typically associated with the dry season, often resulting from snowfall, particularly in the Himalayan regions. Subsequently, the temperature begins to rise in April, and the outside air in April and May becomes exceptionally humid and stifling, with thunderstorm activity occurring in the evenings. Following two months of sweltering conditions, the rainy season commences toward the end of June, driven by the southwest monsoon from the Indian Ocean (Adhikari and Koshimizu 2005). This monsoon climate brings abundant rainfall that persists for three months, lasting until the end of September. Figure 3 demonstrates a notable increase in precipitation from May to June. The peak rainfall occurs in July and August, with over 80% of the total precipitation occurring between June and September (Paudel et al. 2020). Therefore, substantial rainfall during the rainy season becomes a key triggering factor for a higher incidence of debris flows, particularly after the shaking of the Gorkha earthquake.
(The serial number of the districts are included in Fig. 2, and the data comes from HUMANITARIAN DATA EXCHANGE (https://​data.​humdata.​org/​))
In addition to the meteorological conditions in the Nepal Himalayas, the lithology distribution also plays a crucial role in slope stability and the formation of debris flows. The identified debris flows are mainly distributed in three tectonic zones, including Tibetan-Tethys (TT), Higher Himalayan (HH), and Lesser Himalayan (LH) Zones, based on the summarization of Upreti (1999). Approximately a third of the debris flows in Fig. 1 are situated in the TT zone, extending from east to west in the northern Gorkha region, enclosed by the North Dipping Normal Fault System, the South Tibetan Detachment Fault System, and the Indus-Tsangpo Suture Zone. The geological strata in this zone range from the lower Paleozoic to lower Tertiary marine sedimentary succession. As for the HH zone lying below the TT zone, the main rock (gneisses) serves as the basement of the TT zone. The TT and HH zones were initially considered to be continuous until the detection of the South Tibetan Detachment Fault System (Fuchs 1977). Moreover, a small number of debris flows are observed in the LH zone, characterized by rocks that include sandstone, mudstone, and conglomerate, dating from Neogene to Quaternary. In summary, the complex geology conditions and abundant rainfall in these areas contribute to the occurrence of debris flows. Understanding the dangerous areas caused by debris flows after an earthquake becomes significant for site selection of residential areas and infrastructure construction.

3. Methodology

In this paper, we utilized a machine-learning-based method to estimate the travel distance of the debris flows with the involvement of a series of predisposing factors after the Mw 7.8 Gorkha earthquake. The technical route is presented in Fig. 4.
This method is composed of several steps:
(1) Decided debris-flow paths and source areas using satellite images, ENVI and GIS tools. First, the coordinates of the endpoint of the travelling materials where they stopped, and the travel distance of the travelling materials are provided by Dahlquist and West (2019). A total of 89 debris flows were included in this study, and their travel distances ranges from 200 to 8010 m based on the provided dataset in Table 3 of Appendix A. Then, we overlaid the contour map onto the satellite images to identify the central points of the source areas, as illustrated in Fig. 5. This process enables us to establish the trajectories of debris flows.
(2) Upon completing a three-step analysis, which comprises correlation analysis, multi-collinearity analysis, and principal component analysis (PCA), the PCA was employed to eliminate extraneous information from the selected variables to generate three principal components. After that, the three principal components were rescaled into the range of [0.01, 0.99]. The normalized data acts as input variables for model training with the application of the parameter optimization method, named ‘genetic algorithm’.
(3) Conducted model assessment using the evaluation indexes, such as RMSE, MAE and MAPE. Whilst the contributions of the selected variables to estimation accuracy improvement were also evaluated to address the importance of involving all the factors in the model development.
(4) Compared the estimation results of the developed model with existing empirical equations using the testing dataset.

3.1 Selection of disposing factors

The importance of factor selection has been addressed by McDougall (2017). However, the challenge of selecting the most appropriate input parameters still persists. Therefore, factor selection for estimating travel distance holds substantial importance. Our study not only strives to achieve reliable estimations but also aims to offer valuable guidance for factor selection when analyzing the risk of debris flows in the Himalayas. We have preliminarily identified key factors that could contribute to travel distance estimation. These factors encompass various aspects of geomorphic and environmental conditions, including the volume of failure mass (VL), the drop height between the centre of the source area and the endpoint of movement mass (H), the mean gradient of the travelling path (J), the mean curvature of travelling path (C) and the normalized difference vegetation index (NDVI).
VL and H serve as indicators of the potential energy stored within the failure mass, offering insights into its subsequent movement distance (Roback et al. 2018; Zhan et al. 2017; Puglisi et al. 2015; (Qiu et al. 2024c). A greater sediment volume normally could cause a longer runout distance (Legros et al. 2002; Guo et al. 2016; Falconi et al. 2023). As for the mean gradient of the travelling path (J), this factor is proven to present a strong correlation with the travel distance (Rickenmann 1999). Notably, our calculation of J diverges from Rickenmann (1999), J is calculated using the formula proposed by IMHE (1994):
$$J=\frac{{\left( {\sum\limits_{{i=1}}^{n} {\left( {{E_{i - 1}}+{E_i}} \right){L_i} - 2{E_0}L} } \right)}}{{{L^2}}}$$
(1)
where, J is the average path gradient (‰). E1, E2, …, Ei−1, Ei are the elevations of each break point in the movement path (m). Elevation was obtained from a 12.5 m digital elevation model (DEM (downloaded from https://​search.​asf.​alaska.​edu/​#/​)). L1, L2, …, Li−1, Li are the lengths of each section of the movement path (m). n is the number of path sections. E0 is the elevation of the endpoint of mass movement (m), and L is the length of the travel path (m). The divided sections are presented in Fig. 6.
Moreover, C has been used by previous studies to delineate the hazardous area (Yu et al. 2006), a factor closely linked to the runout distance of debris flows (Hürlimann et al. 2015; Prochaska et al. 2008). C represents the mean curvature of the flowing path of a debris-flow event, which was calculated using the ‘Surface’ tool in GIS. Similar to the impacts of C on debris-flow runout distance, NDVI is also considered for the runout estimation (Booth et al. 2020), and serves as a reflection of vegetation coverage within debris-flow affected areas. Lush vegetation can cause a reduction of the debris-flow velocity (Lancaster et al. 2003) and, therefore, result in a decrease in the travel distance (Michelini et al. 2017). In this study, Landsat 8 satellite images captured prior to the occurrence of debris flows are utilized (USGS 2015). These images enable the extraction of mean NDVI values, which represent vegetation coverage. However, whether all these factors are appropriate for the travel distance estimation still needs further studies. As such, a comprehensive three-step analysis is conducted to ensure strong correlations between input variables and travel distance. Simultaneously, collinearity analysis among the input variables and dimension reduction techniques, such as Principal Component Analysis (PCA), is employed to generate a set of corrected variables.

3.2 A three-step analysis

A three-step analysis was employed here to decide the input variables for model development. This analysis is composed of a correlation analysis, a multi-collinearity analysis, and the PCA. The correlation analysis is cconducted by SPSS statistical software to unveil the relationship between each input variable and the travel distance (L). As a result, we can identify and eliminate variables with weak correlations to L. Subsequently, a multi-collinearity analysis is conducted among the remaining variables to avoid distortion and inaccuracy in travel distance estimation. Multi-collinearity demonstrates the linearity between the variables, implying that a specific independent factor can be substituted by other variables through a linear equation. In this context, two indices, named tolerance index (TOL) and variance inflation factor (VIF), were used in this section. Multicollinearity was decided when the TOL value is smaller than 0.1 (Menard 2002) or the VIF value exceeds 10 (Guns and Vanacker 2012). These two indices are calculated using the following equations:
$$Tolerance=1 - R_{i}^{2}$$
(2)
$$VIF=\left[ {\frac{1}{{Tolerance}}} \right]$$
(3)
where Ri2 denotes the coefficient of determination in the regression model when the dependent variable is Xi, while the other input data are independent variables. Following the first two-step analysis, the importance of each variable in contributing to the travel distance can be initially assessed. However, further data processing remains crucial due to intercorrelations among these factors. Additionally, the useless information within the data should also be removed since it can increase the analysis difficulties (Chaib et al. 2015). Therefore, PCA was introduced to reduce the data dimension and eliminate the relevance between factors based on origin software. This method seeks to generate new indices, termed ‘principal components’, which encapsulate the most essential data information. The fundamental PCA process comprises several steps: (1) Normalize the multi-dimension data matrix; (2) calculate the eigenvalues and eigenvectors of this matrix; (3) arrange the eigenvalues and eigenvectors in descending order; (4) Select the first K values based on the accumulative contributions. Finally, (5) a new k-dimensional matrix can be generated through dimension reduction. The whole process can be described as:
$$T=\left(T_1,T_2,......,T_n\right)=\begin{bmatrix}t_{11}t_{12}\cdots t_{1m}\\\vdots\ddots\vdots\\t_{n1}t_{n2}\cdots t_{nm}\end{bmatrix}$$
(4)
where n samples are included in T, and each sample contains the number of m variables. Then, the matrix V, consisting of the calculated eigenvectors V = (V1, V2, …, Vm), contributes to the generation of the principal components V:
$$\left\{ \begin{gathered} {V_1}={v_{11}}{T_1}+{v_{12}}{T_2}+ \cdots +{v_{1n}}{T_n} \hfill \\ {V_2}={v_{21}}{T_1}+{v_{22}}{T_2}+ \cdots +{v_{2n}}{T_n} \hfill \\ \vdots \hfill \\ {V_m}={v_{m1}}{T_1}+{v_{m2}}{T_2}+ \cdots +{v_{mn}}{T_n} \hfill \\ \end{gathered} \right.$$
(5)
where n > m, and the original data is replaced by the principal components, V1, V2, …, Vm. In order to further enhance the input stability and difficulties of data processing ability for the model, the generated three principal components are normalized into the range of [0.01, 0.99] based on the equation:
$${x_{nor}}=\frac{{x - \hbox{min} (x)}}{{\hbox{max} (x) - \hbox{min} (x)}}(U - L)+L$$
(6)
where xnor represents the normalized data, which came from x. U and L are the upper and lower normalization bounds, respectively.

3.3 Development of a machine learning model

In contrast to empirical equations, a machine learning model can encompass a significantly large number of independent factors, which serve to capture topographic and environmental features. On such an algorithm, extreme gradient boosting (XGBoost), a boosting-related algorithm, excels well in various competitions due to its utilization of the second derivative for the calculation of loss functions and the incorporation of additional regularization items. Therefore, XGBoost stands out as a superior choice for conducting regression analysis. In consistent with the other boosting algorithms, including AdaBoost and GBDT, XGBoost is composed of weak regressors, often represented as decision trees. To obtain the estimation value, this model needs to constantly produce trees, which function as weak regressors. After the completion of model training, the cumulative scores associated with the leaf nodes of these trees collectively yield the final estimation value. The mechanism of XGBoost is presented in Appendix. A. In this study, python was used to develop a XGBoost model user the editing environment of pycharm.

3.4 Model assessment

Model assessment is an important component of model development to test the estimation performance. To assess the estimation results of this developed model, the root mean square error (RMSE) and mean absolute error (MAE) were employed to evaluate model performance. RMSE and MAE are widely used indexes to evaluate model performance (Chai and Draxler 2014). In essence, these two indexes can both reflect the reliability and efficiency of the developed model by quantifying the disparities between actual and estimation values. It’s worth noting that RMSE tends to be more sensitive to outliers in the input data compared to MAE (Willmott and Matsuura 2005). Apart from the two metrics for evaluating actual errors, another evaluation index, MAPE (Mean Absolute Percentage Error), was introduced to present the error ratio due to its independence and interpretability (Bowerman et al. 2005). Therefore, the three metrics employed in our study not only evaluate model performance but also aid in identifying outlier values. These statistical indexes can be calculated by:
$$RMSE=\sqrt {\frac{{\sum\limits_{{i=1}}^{n} {{{\left( {{y_{ipre}} - {y_i}} \right)}^2}} }}{n}}$$
(7)
$$MAE=\frac{{\sum\limits_{{i=1}}^{n} {\left| {{y_{ipre}} - {y_i}} \right|} }}{n}$$
(8)
$$MAPE=\frac{{100\% }}{n}\sum\limits_{{i=1}}^{n} {\left| {\frac{{{y_{ipre}} - {y_i}}}{{{y_i}}}} \right|}$$
(9)
where yipre represents the estimation results, and yi is the actual value. n is the number of estimation values. A better model is indicated if the calculated results of RMSE, MAE, and MAPE are closer to 0. Moreover, to further reveal the contributions of each variable in estimating the travelling distance, one variable is removed from model development at a time to generate five estimation models. Then the RMSEs and MAEs of each model are calculated, respectively. Meanwhile, the ratios of RMSE and MAE of the models are also calculated, as the abnormal values may cause the instability of output results. So, this ratio can reflect the model’s stability.

4. Result analysis

4.1 Determination of input variables

Five scatter diagrams (Fig. 7), using the data in Table 3 (see Appendix A), are presented below to describe the correlations of travel distance (L) with each candidate variable. Initial filtration can be achieved based on the r values in these figures. Among the variables considered, height difference (H) exhibits the strongest correlation with the travel distance, closely followed by the volume of failure mass (VL). As for the other variables, J and C, a stronger correlation is observed between J and L, reaching 0.545. C displays a correlation value of 0.401 with L. Conversely, NDVI demonstrates a weak correlation with travel distance, leading to its exclusion from the model development process.
After the correlation analysis between the input and output variables, the subsequent step, referred to as the multi-collinearity analysis, is conducted to assess the linearity among the remaining variables: VL, H, J, and C. The calculated results, as presented in Table 1, clearly indicate the absence of multi-collinearity among these variables, as all TOL values exceed the threshold of 0.1. Furthermore, no VIF values exceed 100, affirming the suitability of all four variables for inclusion in the model development process.
Table 1
Tolerance values and variance inflation factor (VIF)
Factors
Collinearity indexes
 
 
TOL
VIF
Volume of failure mass (VL)
0.743
1.346
Height difference between center of source area and end point of mass movement (H)
0.528
1.894
Mean gradient of travelling path (J)
0.692
1.445
Mean curvature of travelling path (C)
0.866
1.155
$$\left\{\begin{array}{c}PC1=0.4684V_L+0.6012H-0.4999J-0.4114C\\PC2=0.2243V_L+0.1882H-0.5979J+0.7462C\\PC3=0.7765V_L+0.0479H+0.365J+0.5125C\end{array}\right.\\$$
(10)
To ensure a strong correlation between the generated principal components and travel distance, Pearson’s coefficient was introduced again to test the correlation between dependent and independent variables. The results show that PC1 presents the strongest correlation with L, reaching 0.744, followed by PC2, 0.730. The lowest correlation coefficient among the three input variables is represented by the PC3, 0.726. Overall, the three principal components present a strong correlation with L since the absolute values are all larger than 0.7.

4.2. Estimation of travel distance and evaluation of model performance

The normalized principal components are used to train machine learning using Python under the editing environment of Pycharm. The estimation model can be generated, and the results are plotted in Fig. 8.
As depicted in Fig. 8, the estimation results exhibit minor errors to the measured values (see green labels in Fig. 8). To visualize the differences between the estimation results and measured values, the estimation errors are plotted in Fig. 8 (see dark blue labels). Over 80% of the errors are less than 100 m, and the minimum estimation error lowers to 3.2 m. To provide an encompassing evaluation of this model’s performance, we calculate the MAPE, RMSE, and MAE values, yielding values of 8.71%, 144.30 m and 86.15 m, respectively. It’s worth mentioning that the RMSE exceeds 100, even though most errors are less than 100. This can be attributed to the RMSE being more sensitive to abnormal values when compared to MAE. So, we employ both RMSE and MAE to reflect the discrepancies between the estimation results and measured values. Moreover, most of the estimated values are lower than the actual values. This discrepancy, in part, may be due to the ignorance of retention of the failure mass during the transportation process. We calculate the failure mass according to the source area without accounting for the potential loss of materials during their journey.

4.3 Sensitivity analysis

In addition to assessing the overall model performance, it is equally essential to discern the individual contribution of each geomorphic factor to the travel distance estimation. This factor importance analysis not only sheds light on the significance of each raw factor but also offers valuable insights for mitigating debris flows. To achieve this, we evaluate the significance of each raw factor by excluding one factor at one time, thereby training a total of 15 models, including PCA model (PCA1 + PCA2 + PCA3), Model 1 (VL+H+J + C), Model 2 (VL+H+J), Model 3 (VL+H+C), Model 4 (H + J + C), Model 5 (VL+H), Model 6 (H + J), Model 7 (H + C), Model 8 (VL+J), Model 9 (VL+C), Model 10 (J + C), Model 11 (H), Model 12 (VL), Model 13 (J), and Model 14 (C). After that, we test the estimation accuracy of the 15 models based on the RMSE, MAE, and MAPE indices. Before conducting sensitivity analysis, we plotted the estimation results of PCA model and Model 1 to test the efficiency of PCA method in removing noise information and therefore increasing estimation accuracy (Fig. 9). As indicated in Fig. 9, PCA model performs better than Model 1 since the estimation results of Model 1 exhibit the greater divergence from the measured values.
After that, we calculate the MAPE, RMSE, and MAE values of the models to conduct sensitivity analysis, as illustrated in Fig. 10. Figure 10(a) shows that a decreasing proportion of 36.9% is achieved for MAPE after PCAs were used as input data when compared with Model 1. It’s worth noting that the decrease of MAPE indicates the increase of estimation accuracy. A similar decline of 33.1% is noted when factor C is omitted from the factor combination (Model 2). However, a greater MAPE decrease is observed when removing C from model development (Model 3), which reaches a proportion of 43.3%. As for Model 4 comprising of H, J, and C, it achieves a MAPE of 17.5%, which is slightly smaller than the MAPE value of Model 3 (17.1%). It can be concluded that J plays a more critical role than C in travel distance estimation. As for the models with the involvement of two factors, Model 5 performs the best, which indicates that the contributions of J and C to the model development are limited in comparison to VL and H. A significant percentage reduction of MAPE can be found when H factor in Model 6 was replaced by VL (Model 8), reaching 79.2%. Model 10 exhibits the smallest MAPE value due to a combination of J and C. Furthermore, The MAPE value ranges from 45.8 to 79.3% if only H, VL, J, and C were utilized for model development, respectively. This underscores the pivotal roles of VL and H as the main control factors, determining the potential energy and the distance the failure mass can travel (Lo 
As illustrated in Fig. 11, PCA model can achieve the most stable outputs, and the estimation errors range from 8.1 to 182.7 m. As for the Model 1, its maximum estimation error increases to 410.5 m, and the minimum one shows a slightly increase, reaching 9.5 m. After we exclude C from model development (Model 2), the maximum estimation error comes to 693.6 m. The other two models with the involvement of three factors (Mode 3 and Model 4) have the similar maximum value as Model 2. Among the six models from Model 5 to Model 10, the maximum estimation errors of Model 9 and Model 10 come to 1404.2 m and 2829.7 m. These two abnormal values are the leading cause of the RMSE rising in Fig. 10(b). Additionally, the maximum error increase from 504.7 m to 1155.1 m from Model 4 to Model 5. This is also the main reason of the sudden increase of MAPE in Fig. 10(b). As for Model 11, 12, 13, and 14, their maximum errors are all greater than 3,000 m. Therefore, to estimate the travel distance of debris flows after the earthquake, all four factors should be involved in model development, and a PCA analysis can be employed to further increase the estimation accuracy.
However, further verifications are still essential to present the superiority of this machine learning method in estimating travel distance when compared with existing empirical equations. So, the comparison analysis between the machine learning method in this paper and empirical methods is conducted in the next section.

5. Comparison with existing empirical equations

To further test the performance of the machine learning model, four empirical equations are employed in this section. The proposed equations in the past four studies are presented in Table 2. The calculated results using the testing dataset are plotted in Fig. 12.
Table 2
Summary of the empirical equations for travel distance
Source
Equations
Dataset
(Rickenmann 1999)
\(L=1.9{M^{0.16}}H_{e}^{{0.83}}\) (11)
Italy, Japan, China, Swiss, U.S.A, Columbia
(Lorente et al. 2003)
\(L=7.13{\left( {M{H_e}} \right)^{0.271}}\) (12)
Central Spanish Pyrenees
(Hürlimann et al. 2015)
\(L=7.48{V^{0.45}}\) (13)
Switzerland
This figure illustrates that the estimation results exhibit the closest agreement with the actual values, displaying a uniform distribution around the line of perfect agreement. This indicates that the machine learning method produces estimations without evident overestimation or underestimation. However, the estimated results by Rickenmann (1999) consistently tend to be higher than the measured values. As for the equation proposed by Rickenmann (1999), the author used data from different countries, including Mount St. Helens Iahars and Nevado del Ruiz, U.S.A, where ample water involvement led to longer travel distances. Consequently, the estimated results by the two equations are inevitably higher than the measured values. In contrast, the other two equations proposed by Lorente et al. (2003)d rlimann et al. (2015) underestimate the travel distance. Lorente et al. (2003) defined the H as the height difference between the head of the source area and the point where the travelling mass starts to deposit. However, this definition may ignore the length of the stopped sediments and, therefore, present lower estimation values. Additionally, the equation of Hürlimann et al. (2015) relies on laboratory experiments. Laboratory experiments cannot fully simulate the real scenarios because the measured maximum travel distance of debris flows can only reach 450 m for large volumes in this study. Therefore, the proposed equation relying on the laboratory data is prone to underestimating the travel distance of debris flows. Overall, our study performs the best, as it exhibits the smallest error when compared to the measured values.

6. Discussion and limitations

The superiority of machine learning in travel distance estimation may mainly rely on the compatibility of this method by involving more factors in behavior delineation of debris flows from multi-dimensional perspectives. Therefore, factor selection becomes significant since the input data should reflect the impacts on travel distance and capture physical reality. Normally, two categories can be suggested, including geometric-morphological and magnitude-based factors (Zhou et al. 2019). For a better estimation quality, we select both the two types of factors for model development. However, a limitation may not be ignored, that is, the effect of pore-fluid pressure to runout behaviour of debris flows (McCoy et al. 2010; Zheng et al., 2023). A debris-flow event can be initiated when Rickenmann exceeds the thresholds, but different rainfall intensity may result in different runout distance even though the sediment volume is fixed value. This is because different rainfall intensities can impact the saturation degree of sediment mass and further result in different mobility. As a result, travel distance would vary due to different rainfall intensity. However, the sparse distribution of rain gauge stations restricts the determination of reliable rainfall data. Therefore, a further study is essential to improve the travel distance estimation to establish a reliable warning system in this area. Additionally, integrating more factors than empirical equations may also bring useless information and, therefore, increase the complexity of the program running. So, PCA plays a critical role in dimension reduction by removing redundant information.
The analysis of debris flows, including the initiation, flowing and deposition, is a complicated task, which requires an understanding of the topographic features, environmental conditions, and geological settings. Machine learning can provide an alternative to simplify the flowing process without conducting fluid and mechanical analysis to achieve a reliable estimation or warning. XGBoost used in this study was developed based on Gradient Boosting Decision Tree (GBDT) with the involvement of a L2 regular term to void overfitting and development of a second-order expansion (Dong et al. 2022). As a result, this model has attracted a lot of attentions from various fields due to its good performance in both computation speed and prediction accuracy (Wang et al. 2022; Qiu et al. 2024a). For example, XGBoost performs better than LR in medical field (Wang et al. 2022). Additionally, XGBoost outperforms ANN and SVM in predicting underground levels (Osman et al. 2021). Again, a better performance was found in utilizing XGBoost to predict concrete strength when compared to SVM and multilayer perceptron (MLP) (Nguyen et al. 2021). Overall, XGBoost is an effective machine learning model in conducting prediction analysis. Therefore, this model is selected in our study. It’s worth noting that we cannot conclude that our model developed using XGBoost can perform well in all fields and different conditions. The performance may vary due to different data structure and sample size, which may appear to be the limitation of our model for a wide application due to the involvement of debris-flow data only in the Nepal Himalayan Mountain regions. However, developing a model that is suitable for a worldwide application remains a challenge, which requires continuous input of debris-flow data globally and cannot be fully completed in this study. Even though there are several empirical equations incorporating data in different countries and regions into model development, such as an equation proposed by Rickenmann (1999), they may not provide a reliable estimation when applying to a specific site based on a simple fitting equation. However, the introduction of machine learning method can improve this limitation due to its strong ability in finding the hidden and complex relationship. Therefore, the superiority and efficiency of this model cannot be altered in travel distance estimation analysis.
Moreover, the loss of mass volume during the flowing process is ignored, and we assume that all the failure mass would arrive at the endpoint. This assumption may be the reason for estimation errors. Some estimation results are larger than the actual values because we ignore the retention of materials during the flowing process. However, regardless of the limitation in our model, it is still effective in estimating the travel distance of debris flows after the earthquake, which explicitly improves the estimation accuracy. The application of this model can be further expanded with the input of more debris-flow data, which may substantially improve estimation accuracy. Overall, we provide a machine-learning-based method to estimate the travel distance of debris flows, which can achieve effective warning and mitigation of debris flows after earthquakes in mountainous areas.

7. Conclusion

In conclusion, we introduce a machine-learning-based method to estimate the travel distance of debris flows along the Nepal Himalayas after the 2015 Gorkha earthquake. First, the travel path and center of the material source area are decided based on the identified debris flows. Then, five factors in relation to the geomorphological and environmental conditions are initially selected, including VL, H, J, C, and NDVI. After that, a correlation analysis is conducted to analyze the correlations between each variable and travel distance. Then, the multi-collinearities among variables are investigated to remove NDVI because it presents a weak correlation with travel distance. Furthermore, the remaining four variables are used to generate principal components, PC1, PC2 and PC3, to reduce the dimension of input data and ensure model stability.
Moreover, the decided input data is normalized into the range of 0.01 to 0.99, and then the data is separated into a training set and a testing set with a ratio of 7:3. The training set is used to train the estimation model using GA-XGBoost. GA is employed to generate the optimal hyperparameters of XGBoost. As for the performance of the model, RMSE, MAE and MAPE are used to evaluate the trained model, and the results show that the MAPE is 8.71%. The RMSE and MAE are 144.3 m and 86.1 m, respectively. Additionally, to reveal the contributions of each variable to the estimation of travel distance, sixteen models were developed by excluding a factor at one time from model development, including PCA model (PC1 + PC2 + PC3), Model 1 (VL+H+J + C), Model 2 (VL+H+J), Model 3 (VL+H+C), Model 4 (H + J + C), Model 5 (VL+H), Model 6 (H + J), Model 7 (H + C), Model 8 (VL+J), Model 9 (VL+C), Model 10 (J + C), Model 11 (H), Model 12 (VL), Model 13 (J), and Model 14 (C). The performances of these models were evaluated using the three indexes again. The results show the necessity of incorporating all four factors into model development if high accuracy is expected. The proposed factor combination in our studies is suitable for estimating travel distance for debris flows after the earthquake. Finally, we compared the estimation model with existing empirical equations. Our proposed model performs the best because the estimation results are the closest to the actual values. Therefore, this model can effectively estimate the travel distance of debris flows after the earthquake, but slight fluctuations of the estimation accuracy may be inevitable due to the different topographic conditions if this model is applied to other areas.

Acknowledgements

This work was financially supported by the European Union’s Horizon 2020 research and innovation program Marie Skłodowska-Curie Actions Research and Innovation Staff Exchange (RISE) (Grant No 778360). For the purpose of open access, the author has applied a Creative Commons Attribution (CC-BY) licence to any Author Accepted Manuscript version arising from the submission.
For the purpose of open access, the author has applied a Creative Commons Attribution (CC-BY) licence to any Author Accepted Manuscript version arising from the submission.

Declarations

Competing interests

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix

Appendix

1. The mathematical details of XGBoost are presented as
$$L\left(\phi\right)=\sum_i\mathcal l\left(y_i,y_{ipre}\right)+\sum_k\Omega\left(f_k\right)=\sum_i\mathcal l\left(y_i,y_{ipre}\right)+\gamma T+\frac12\lambda\left\|\omega\right\|^2$$
(14)
Where, \(\sum\limits_{i} {\ell \left( {{y_i},{\text{ }}{{\text{y}}_{ipre}}} \right)}\) is the loss function, and \(\sum\limits_{k} {\Omega \left( {{f_k}} \right)}\) represents the regularization term. yi and yipre are the true value and predicted value. The newly developed tree each time needs to fit the residual error of last prediction, so the prediction score after the generation of the tth tree can be:
$$y_{{ipre}}^{t}=y_{{ipre}}^{{t - 1}}+{f_t}\left( x \right)$$
(15)
To minimize the objective function, Taylor’s second-order expansion is introduced when ft=0:
$${L^t} \simeq \sum\limits_{{i=1}}^{n} {\left[ {\ell \left( {{y_i},{\text{ y}}_{{ipre}}^{{t - 1}}} \right)+{g_i}{f_t}\left( {{x_i}} \right)+\frac{1}{2}{h_i}f_{t}^{2}\left( {{x_i}} \right)} \right]} +\gamma T+\frac{1}{2}\lambda ||\omega |{|^2}$$
(16)
Where, gi is the first-order derivative (\({g_i}={\partial _{y_{{ipre}}^{{t - 1}}}}\ell \left( {{y_i},{\text{ y}}_{{ipre}}^{{t - 1}}} \right)\)), and hi represents the second-order derivative (\({h_i}=\partial _{{y_{{ipre}}^{{t - 1}}}}^{2}\ell \left( {{y_i},{\text{ y}}_{{ipre}}^{{t - 1}}} \right)\)). The newly generated principal components were employed to replace the raw data to develop prediction model. To develop an optimal model for travel distance prediction, a genetic algorithm (GA) was employed to obtain the optimal hyperparameters.
2. Debris-flow database for model development (Table 3.).
Table 3
Database of model training
No.
VL / m3
H / m
J
C
L / m
Lon / °
Lat / °
1
2600
860
0.2569
-0.1032
1020
85.2781
27.8714
2
13800
460
0.7172
-0.2500
580
85.3128
28.1443
3
5100
410
0.7115
-0.1799
520
86.4884
27.7117
4
52200
1410
0.2934
-0.4365
4990
85.2119
27.9729
5
2500
280
0.3594
0.0511
690
85.5100
27.8904
6
1800
220
0.7448
-0.2479
290
85.6546
28.0144
7
3900
300
0.4500
-0.1256
480
85.5811
27.9437
8
800
550
0.4136
-0.0815
880
85.5636
27.9385
9
9600
690
0.5344
-0.2138
1250
85.7869
28.0052
10
2700
910
0.4063
-0.5347
2070
86.2283
27.8235
11
1200
320
0.8765
-0.2014
340
86.2279
27.8262
12
3900
1430
0.0562
-0.1339
1940
85.7818
28.0131
13
6800
890
0.5331
-0.3187
1180
84.2370
28.3759
14
15100
720
0.8012
-0.2386
820
85.4167
28.1572
15
2400
820
0.2680
-0.1025
1084
85.4111
28.1561
16
15600
690
0.5440
0.0022
1250
86.4532
27.6750
17
2800
210
0.9841
-0.1833
440
86.4584
27.6475
18
700
390
0.4239
-0.1444
460
85.9731
27.6653
19
500
630
0.4310
-0.3072
840
86.2633
27.4717
20
3300
740
0.4758
-0.4286
1320
86.2759
27.4639
21
4300
940
0.7849
-0.1327
1710
87.4505
27.6845
22
3300
470
0.6431
-0.0046
650
87.3592
27.6587
23
23200
960
0.4973
-0.3111
1500
86.3468
27.5231
24
18500
1290
0.5560
-0.3085
1590
85.1594
28.2437
25
3000
450
0.6049
-0.6971
1430
85.1146
28.2106
26
8700
420
0.5518
-0.2756
560
85.0556
28.4876
27
7200
570
0.3222
-0.3115
1170
84.9768
27.9742
28
1700
620
0.6637
-0.0024
1020
84.6402
28.2432
29
800
690
0.5144
0.0397
970
86.0390
27.7311
30
1400
790
0.2080
-0.5000
1250
85.2988
28.0377
31
3100
750
0.6126
-0.1840
870
85.3002
28.0365
32
6500
310
0.5500
-0.1420
440
86.4889
27.7159
33
9400
1200
0.1866
-0.7565
4020
86.1039
27.8649
34
37500
1970
0.0445
-0.1520
8010
86.4568
27.7036
35
1600
650
0.3473
-0.3333
1220
84.0933
28.3297
36
16800
830
0.3828
-0.5745
2030
86.9857
27.3992
37
35000
380
0.1807
-0.1159
3110
86.5469
27.3544
38
3800
270
0.4800
0.0159
500
86.6788
27.5939
39
10400
680
0.2732
0.1023
820
86.5494
27.3487
40
19200
390
0.6146
-0.4059
942
86.6884
27.6050
41
13900
780
0.3475
-0.1528
900
84.1071
28.2221
42
6700
1580
0.0755
-0.5974
3230
85.6468
28.0022
43
10400
900
0.3523
-0.7157
2220
85.0781
28.0128
44
79900
1880
0.1037
-0.6030
7390
85.6688
27.9999
45
13500
780
0.3512
-0.5667
2037
85.6780
28.0112
46
9400
280
0.7818
-0.2166
360
87.1681
27.5213
47
17700
340
0.4902
-0.1556
610
87.2326
27.5415
48
12000
430
0.5073
-0.4910
1100
86.8643
27.4115
49
800
550
0.3081
-0.2712
860
86.8209
27.5331
50
4300
290
0.7353
-0.0317
340
86.8314
27.5198
51
5400
350
0.4233
-0.2727
430
86.8268
27.5209
52
1300
570
0.3425
-0.3086
870
84.5119
28.2626
53
3600
1070
0.1419
-0.2908
2170
84.5904
28.3425
54
5200
320
0.8079
-0.2020
380
84.5941
28.3574
55
1000
320
0.7378
0.0511
450
84.7527
28.2203
56
2400
1130
0.1429
-0.2438
2317
84.7108
28.2760
57
4500
600
0.6188
-0.4038
1120
84.7164
28.2843
58
21400
710
0.5553
-0.6153
1610
85.0010
28.2315
59
1100
500
0.3515
-0.3157
907
85.8357
28.0159
60
4200
270
0.4425
-0.1667
400
86.0654
27.8344
61
9100
550
0.5866
-0.0732
936
86.0541
27.8426
62
17200
820
0.3664
-0.7108
2200
86.1144
27.7254
63
1700
470
0.5400
-0.2162
450
86.0886
27.7078
64
10500
370
0.5214
-0.2593
560
85.9198
27.8173
65
6500
580
0.3549
-0.1790
1220
85.8831
27.8137
66
3100
320
0.3647
-0.2222
510
85.8759
27.8234
67
7900
420
0.8915
-0.087
590
86.2178
27.8935
68
1300
280
0.3932
-0.2924
440
86.2146
27.8968
69
14200
140
0.5853
-0.2492
680
85.9686
27.9317
70
5800
240
0.7806
-0.2000
310
85.9604
27.9355
71
1600
370
0.6538
-0.3076
571
86.3910
27.6595
72
3000
550
0.2700
-0.0337
760
86.3895
27.6557
73
900
470
0.6600
-0.3038
550
85.3990
27.9902
74
18000
560
0.6988
-0.0523
800
85.3213
27.9753
75
1300
220
0.5171
-0.3012
410
84.6914
28.2285
76
8500
660
0.4272
-0.3795
1730
84.2015
28.2752
77
3800
260
0.8240
-0.0202
250
84.2095
28.3369
78
1400
740
0.6103
-0.3083
890
86.3142
27.7763
79
3000
780
0.3316
-0.1307
1140
86.3236
27.7797
80
6200
920
0.5330
-0.4255
1090
86.2283
27.7594
81
4000
280
0.5674
-0.0899
460
86.2261
27.7576
82
3200
520
0.28000
-0.0215
750
83.8930
28.3293
83
400
610
0.5970
-0.1753
990
83.7466
28.4383
84
700
190
0.4730
-0.0058
370
85.6329
27.9214
85
20100
310
0.6160
-0.4035
1000
86.4543
27.7029
86
1500
220
1.1428
-0.0370
200
84.7213
28.2480
87
16900
590
0.7325
-0.2762
800
84.7121
28.2124
88
18600
610
0.5102
0.0000
880
85.3737
28.2646
89
22400
600
0.4403
-0.0833
720
86.0081
27.8534
Literature
go back to reference Adhikari DP, Koshimizu S (2005) Debris flow disaster at Larcha, upper Bhotekoshi Valley, central Nepal. Isl Arc 14:410–423CrossRef Adhikari DP, Koshimizu S (2005) Debris flow disaster at Larcha, upper Bhotekoshi Valley, central Nepal. Isl Arc 14:410–423CrossRef
go back to reference Booth AM, Sifford C, Vascik B, Siebert C, Buma B (2020) Large wood inhibits debris flow runout in forested southeast Alaska. Earth Surf Process Land 45:1555–1568CrossRef Booth AM, Sifford C, Vascik B, Siebert C, Buma B (2020) Large wood inhibits debris flow runout in forested southeast Alaska. Earth Surf Process Land 45:1555–1568CrossRef
go back to reference Bowerman BL, O’Connell RT, Koehler AB (2005) Forecasting, time series, and regression: an applied approach. 4th edn. Duxbury Press, p 686 Bowerman BL, O’Connell RT, Koehler AB (2005) Forecasting, time series, and regression: an applied approach. 4th edn. Duxbury Press, p 686
go back to reference Cascini L, Cuomo S, Pastor M, Sorbino G, Piciullo L (2014) SPH run-out modelling of channelised landslides of the flow type. Geomorphology 214:502–513CrossRef Cascini L, Cuomo S, Pastor M, Sorbino G, Piciullo L (2014) SPH run-out modelling of channelised landslides of the flow type. Geomorphology 214:502–513CrossRef
go back to reference Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? -Arguments against avoiding RMSE in the literature. Geosci Model Dev 7:1247–1250CrossRef Chai T, Draxler RR (2014) Root mean square error (RMSE) or mean absolute error (MAE)? -Arguments against avoiding RMSE in the literature. Geosci Model Dev 7:1247–1250CrossRef
go back to reference Chaib S, Gu Y, Yao H (2015) An informative feature selection method based on sparse PCA for VHR scene classification. IEEE Geosci Remote Sens Lett 13:147–151CrossRef Chaib S, Gu Y, Yao H (2015) An informative feature selection method based on sparse PCA for VHR scene classification. IEEE Geosci Remote Sens Lett 13:147–151CrossRef
go back to reference Corominas J (1996) The angle of reach as a mobility index for small and large landslides. Can Geotech J 33:260–271CrossRef Corominas J (1996) The angle of reach as a mobility index for small and large landslides. Can Geotech J 33:260–271CrossRef
go back to reference Corominas J, van Westen C, Frattini P, Cascini L, Malet JP, Fotopoulou S, Catani F, Van Den Eeckhaut M, Mavrouli O, Agliardi F, Pitilakis K, Winter MG, Pastor M, Ferlisi S, Tofani V, Hervás J, Smith JT (2014) Recommendations for the quantitative analysis of landslide risk. Bull Eng Geol Environ 73:209–263 Corominas J, van Westen C, Frattini P, Cascini L, Malet JP, Fotopoulou S, Catani F, Van Den Eeckhaut M, Mavrouli O, Agliardi F, Pitilakis K, Winter MG, Pastor M, Ferlisi S, Tofani V, Hervás J, Smith JT (2014) Recommendations for the quantitative analysis of landslide risk. Bull Eng Geol Environ 73:209–263
go back to reference Crosta GB, Dal Negro P, Frattini P (2003) Soil slips and debris flows on terraced slopes. Nat Hazards Earth Syst Sci 3:31–42CrossRef Crosta GB, Dal Negro P, Frattini P (2003) Soil slips and debris flows on terraced slopes. Nat Hazards Earth Syst Sci 3:31–42CrossRef
go back to reference Dahlquist MP, West AJ (2019) Initiation and runout of post-seismic debris flows: insights from the 2015 Gorkha Earthquake. Geophys Res Lett 46:9658–9668CrossRef Dahlquist MP, West AJ (2019) Initiation and runout of post-seismic debris flows: insights from the 2015 Gorkha Earthquake. Geophys Res Lett 46:9658–9668CrossRef
go back to reference Devoli G, De Blasio FV, Elverhøi A, Høeg K (2009) Statistical analysis of landslide events in Central America and their run-out distance. Geotech Geol Eng 27:23–42CrossRef Devoli G, De Blasio FV, Elverhøi A, Høeg K (2009) Statistical analysis of landslide events in Central America and their run-out distance. Geotech Geol Eng 27:23–42CrossRef
go back to reference Dong JW, Chen Y, Yao BY, Zhang X, Zeng NF (2022) A neural network boosting regression model based on XGBoost. Appl Soft Comput 125:109067CrossRef Dong JW, Chen Y, Yao BY, Zhang X, Zeng NF (2022) A neural network boosting regression model based on XGBoost. Appl Soft Comput 125:109067CrossRef
go back to reference Du J, Glade T, Woldai T, Chai B, Zeng B (2020) Landslide susceptibility assessment based on an incomplete landslide inventory in the Jilong Valley, Tibet, Chinese Himalayas. Eng Geol 270:105572CrossRef Du J, Glade T, Woldai T, Chai B, Zeng B (2020) Landslide susceptibility assessment based on an incomplete landslide inventory in the Jilong Valley, Tibet, Chinese Himalayas. Eng Geol 270:105572CrossRef
go back to reference Falconi LM, Moretti L, Puglisi C, Righini G (2023) Debris and mud flows runout assessment: a comparison among empirical geometric equations in the Giampilieri and Briga basins (east Sicily, Italy) affected by the event of October 1, 2009. Nat Hazards 117(3):2347–2373 Falconi LM, Moretti L, Puglisi C, Righini G (2023) Debris and mud flows runout assessment: a comparison among empirical geometric equations in the Giampilieri and Briga basins (east Sicily, Italy) affected by the event of October 1, 2009. Nat Hazards 117(3):2347–2373
go back to reference Fuchs G (1977) Traverse of Zanskar from the Indus to the Valley of Kashmir—a preliminary note. Jahrb Der Geol Bundesanstalt 120:219–229 Fuchs G (1977) Traverse of Zanskar from the Indus to the Valley of Kashmir—a preliminary note. Jahrb Der Geol Bundesanstalt 120:219–229
go back to reference Guns M, Vanacker V (2012) Logistic regression applied to natural hazards: rare event logistic regression with replications. Nat Hazards Earth Syst Sci 12:1937–1947CrossRef Guns M, Vanacker V (2012) Logistic regression applied to natural hazards: rare event logistic regression with replications. Nat Hazards Earth Syst Sci 12:1937–1947CrossRef
go back to reference Guo C, Zhang Y, Montgomery DR, Du Y, Zhang G, Wang S (2016) How unusual is the long-runout of the earthquake-triggered giant Luanshibao landslide, Tibetan Plateau, China? Geomorphology 259:145–154 Guo C, Zhang Y, Montgomery DR, Du Y, Zhang G, Wang S (2016) How unusual is the long-runout of the earthquake-triggered giant Luanshibao landslide, Tibetan Plateau, China? Geomorphology 259:145–154
go back to reference Hürlimann M, McArdell BW, Rickli C (2015) Field and laboratory analysis of the runout characteristics of hillslope Debris flows in Switzerland. Geomorphology 232:20–32CrossRef Hürlimann M, McArdell BW, Rickli C (2015) Field and laboratory analysis of the runout characteristics of hillslope Debris flows in Switzerland. Geomorphology 232:20–32CrossRef
go back to reference Institute of Mountian Hazards and Environment (IMHE) (1994) Flood, Debris flow, landslide hazard and control. Science Publications. ((in Chinese)) Institute of Mountian Hazards and Environment (IMHE) (1994) Flood, Debris flow, landslide hazard and control. Science Publications. ((in Chinese))
go back to reference Khosravi K, Khozani ZS, Mao L (2021) A comparison between advanced hybrid machine learning algorithms and empirical equations applied to abutment scour depth prediction. J Hydrol 596:126100CrossRef Khosravi K, Khozani ZS, Mao L (2021) A comparison between advanced hybrid machine learning algorithms and empirical equations applied to abutment scour depth prediction. J Hydrol 596:126100CrossRef
go back to reference Legros F (2002) The mobility of long-runout landslides. Eng Geol 63(3–4):301–331 Legros F (2002) The mobility of long-runout landslides. Eng Geol 63(3–4):301–331
go back to reference Li Z, He Y, An W, Song L, Zhang W, Catto N, Wang Y, Wang S, Liu H, Cao W (2011) Climate and glacier change in southwestern China during the past several decades. Environ Res Lett 6:45404CrossRef Li Z, He Y, An W, Song L, Zhang W, Catto N, Wang Y, Wang S, Liu H, Cao W (2011) Climate and glacier change in southwestern China during the past several decades. Environ Res Lett 6:45404CrossRef
go back to reference Lin GF, Chang MJ, Huang YC, Ho JY (2017) Assessment of susceptibility to rainfall-induced landslides using improved self-organizing linear output map, support vector machine, and logistic regression. Eng Geol 224:62–74CrossRef Lin GF, Chang MJ, Huang YC, Ho JY (2017) Assessment of susceptibility to rainfall-induced landslides using improved self-organizing linear output map, support vector machine, and logistic regression. Eng Geol 224:62–74CrossRef
go back to reference Lorente A, Beguería S, Bathurst JC, García-Ruiz JM (2003) Debris flow characteristics and relationships in the Central Spanish Pyrenees. Nat Hazards Earth Syst Sci 3:683–692CrossRef Lorente A, Beguería S, Bathurst JC, García-Ruiz JM (2003) Debris flow characteristics and relationships in the Central Spanish Pyrenees. Nat Hazards Earth Syst Sci 3:683–692CrossRef
go back to reference Lv Q, Liu Y, Yang Q (2017) Stability analysis of earthquake-induced rock slope based on back analysis of shear strength parameters of rock mass. Eng Geol 228:39–49CrossRef Lv Q, Liu Y, Yang Q (2017) Stability analysis of earthquake-induced rock slope based on back analysis of shear strength parameters of rock mass. Eng Geol 228:39–49CrossRef
go back to reference McCoy SW, Kean JW, Coe JA, Staley DM, Wasklewicz TA, Tucker GE (2010) Evolution of a natural debris flow: in situ measurements of flow dynamics, video imagery, and terrestrial laser scanning. Geology 38(8):735–738CrossRef McCoy SW, Kean JW, Coe JA, Staley DM, Wasklewicz TA, Tucker GE (2010) Evolution of a natural debris flow: in situ measurements of flow dynamics, video imagery, and terrestrial laser scanning. Geology 38(8):735–738CrossRef
go back to reference McDougall S (2017) 2014 Canadian geotechnical colloquium: landslide runout analysis—current practice and challenges. Can Geotech J 54:605–620CrossRef McDougall S (2017) 2014 Canadian geotechnical colloquium: landslide runout analysis—current practice and challenges. Can Geotech J 54:605–620CrossRef
go back to reference Michelini T, Bettella F, D’Agostino V (2017) Field investigations of the interaction between debris flows and forest vegetation in two Alpine fans. Geomorphology 279:150–164CrossRef Michelini T, Bettella F, D’Agostino V (2017) Field investigations of the interaction between debris flows and forest vegetation in two Alpine fans. Geomorphology 279:150–164CrossRef
go back to reference Nguyen H, Vu T, Vo TP, Thai HT (2021) Efficient machine learning models for prediction of concrete strengths. Constr Build Mater 266:120950CrossRef Nguyen H, Vu T, Vo TP, Thai HT (2021) Efficient machine learning models for prediction of concrete strengths. Constr Build Mater 266:120950CrossRef
go back to reference Osman AIA, Ahmed AN, Chow MF, Huang YF, El-Shafie A (2021) Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Eng J 12:1545–1556CrossRef Osman AIA, Ahmed AN, Chow MF, Huang YF, El-Shafie A (2021) Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia. Ain Shams Eng J 12:1545–1556CrossRef
go back to reference Paudel B, Fall M, Daneshfar B (2020) GIS-based assessment of debris flow hazards in Kulekhani Watershed. Nepal Nat Hazards 101:143–172CrossRef Paudel B, Fall M, Daneshfar B (2020) GIS-based assessment of debris flow hazards in Kulekhani Watershed. Nepal Nat Hazards 101:143–172CrossRef
go back to reference Paudel B, Fall M, Daneshfar B (2021) Gis-based assessment of debris flow runout in Kulekhani Watershed, Nepal. Geotech Geol Eng 39:2755–2775CrossRef Paudel B, Fall M, Daneshfar B (2021) Gis-based assessment of debris flow runout in Kulekhani Watershed, Nepal. Geotech Geol Eng 39:2755–2775CrossRef
go back to reference Phillips CJ, Davies TR (1991) Determining rheological parameters of debris flow material. Geomorphology 4(2):101–110CrossRef Phillips CJ, Davies TR (1991) Determining rheological parameters of debris flow material. Geomorphology 4(2):101–110CrossRef
go back to reference Prochaska AB, Santi PM, Higgins JD, Cannon SH (2008) Debris-flow runout predictions based on the average channel slope (ACS). Eng Geol 98:29–40CrossRef Prochaska AB, Santi PM, Higgins JD, Cannon SH (2008) Debris-flow runout predictions based on the average channel slope (ACS). Eng Geol 98:29–40CrossRef
go back to reference Puglisi C, Falconi L, Gioè C, Leoni G (2015) Contribution to the runout evaluation of potential debris flows in Peloritani Mountains (Messina, Italy). In: Engineering geology for society and territory-Voulme2: landslide processes. Springer International Publishing, pp 509–513 Puglisi C, Falconi L, Gioè C, Leoni G (2015) Contribution to the runout evaluation of potential debris flows in Peloritani Mountains (Messina, Italy). In: Engineering geology for society and territory-Voulme2: landslide processes. Springer International Publishing, pp 509–513
go back to reference Qiu C, Su L, Geng X (2024a) A precipitation downscaling framework for regional warning of debris flows in mountainous areas. Nat Hazards 120(2):1979–2004 Qiu C, Su L, Geng X (2024a) A precipitation downscaling framework for regional warning of debris flows in mountainous areas. Nat Hazards 120(2):1979–2004
go back to reference Qiu C, Su L, Pasuto A, Bossi G, Geng X (2024b) Economic risk assessment of future debris flows by machine learning method. Int J Disaster Risk Sci 15(1):149–164 Qiu C, Su L, Pasuto A, Bossi G, Geng X (2024b) Economic risk assessment of future debris flows by machine learning method. Int J Disaster Risk Sci 15(1):149–164
go back to reference Qiu C, Su L, Bian C, Zhao B Geng X (2024c) An AI-based method for estimating the potential runout distance of post-seismic debris flows. Int J Disaster Risk Sci pp 1–14 Qiu C, Su L, Bian C, Zhao B Geng X (2024c) An AI-based method for estimating the potential runout distance of post-seismic debris flows. Int J Disaster Risk Sci pp 1–14
go back to reference Qiu C, Su L, Zou Q, Geng X (2022) A hybrid machine-learning model to map glacier-related debris flow susceptibility along Gyirong Zangbo watershed under the changing climate. Sci Total Environ 818:151752CrossRef Qiu C, Su L, Zou Q, Geng X (2022) A hybrid machine-learning model to map glacier-related debris flow susceptibility along Gyirong Zangbo watershed under the changing climate. Sci Total Environ 818:151752CrossRef
go back to reference Rahmati O, Tahmasebipour N, Haghizadeh A, Pourghasemi HR, Feizizadeh B (2017) Evaluation of different machine learning models for predicting and mapping the susceptibility of gully erosion. Geomorphology 298:118–137CrossRef Rahmati O, Tahmasebipour N, Haghizadeh A, Pourghasemi HR, Feizizadeh B (2017) Evaluation of different machine learning models for predicting and mapping the susceptibility of gully erosion. Geomorphology 298:118–137CrossRef
go back to reference Regmi NR, Giardino JR, Vitek JD (2010) Modeling susceptibility to landslides using the weight of evidence approach: Western Colorado, USA. Geomorphology 115:172–187CrossRef Regmi NR, Giardino JR, Vitek JD (2010) Modeling susceptibility to landslides using the weight of evidence approach: Western Colorado, USA. Geomorphology 115:172–187CrossRef
go back to reference Rickenmann D (1999) Empirical relationships for debris flows. Nat Hazards 19:47–77CrossRef Rickenmann D (1999) Empirical relationships for debris flows. Nat Hazards 19:47–77CrossRef
go back to reference Roback K, Clark MK, West AJ, Zekkos D, Li G, Gallen SF, Chamlagain D, Godt JW (2018) The size, distribution, and mobility of landslides caused by the 2015 Mw7. 8 Gorkha earthquake, Nepal. Geomorphology 301:121–138CrossRef Roback K, Clark MK, West AJ, Zekkos D, Li G, Gallen SF, Chamlagain D, Godt JW (2018) The size, distribution, and mobility of landslides caused by the 2015 Mw7. 8 Gorkha earthquake, Nepal. Geomorphology 301:121–138CrossRef
go back to reference Shieh CL, Chen YS, Tsai YJ, Wu JH (2009) Variability in rainfall threshold for debris flow after the Chi-Chi earthquake in central Taiwan, China. Int J Sediment Res 24:177–188CrossRef Shieh CL, Chen YS, Tsai YJ, Wu JH (2009) Variability in rainfall threshold for debris flow after the Chi-Chi earthquake in central Taiwan, China. Int J Sediment Res 24:177–188CrossRef
go back to reference Tang C, Jiang Z, Li W (2015) Seismic landslide evolution and debris flow development: a case study in the Hongchun Catchment, Wenchuan area of China. In: Engineering geology for society and territory-volume 2: landslide processes. Springer International Publishing, pp 445–449 Tang C, Jiang Z, Li W (2015) Seismic landslide evolution and debris flow development: a case study in the Hongchun Catchment, Wenchuan area of China. In: Engineering geology for society and territory-volume 2: landslide processes. Springer International Publishing, pp 445–449
go back to reference Tang C, Zhu J, Li WL, Liang JT (2009) Rainfall-triggered debris flows following the Wenchuan earthquake. Bull Eng Geol Environ 68:187–194CrossRef Tang C, Zhu J, Li WL, Liang JT (2009) Rainfall-triggered debris flows following the Wenchuan earthquake. Bull Eng Geol Environ 68:187–194CrossRef
go back to reference Upreti BN (1999) An overview of the stratigraphy and tectonics of the Nepal Himalaya. J Asian Earth Sci 17:577–606CrossRef Upreti BN (1999) An overview of the stratigraphy and tectonics of the Nepal Himalaya. J Asian Earth Sci 17:577–606CrossRef
go back to reference Wang RR, Wang LP, Zhang J, He M, Xu JG (2022) XGBoost machine learning algorism performed better than regression models in predicting mortality of moderate-to-severe traumatic brain injury. World Neurosurg 163:e617–e622CrossRef Wang RR, Wang LP, Zhang J, He M, Xu JG (2022) XGBoost machine learning algorism performed better than regression models in predicting mortality of moderate-to-severe traumatic brain injury. World Neurosurg 163:e617–e622CrossRef
go back to reference Willmott CJ, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res 30:79–82CrossRef Willmott CJ, Matsuura K (2005) Advantages of the mean absolute error (MAE) over the root mean square error (RMSE) in assessing average model performance. Clim Res 30:79–82CrossRef
go back to reference Yu FC, Chen CY, Chen TC, Hung FY, Lin SC (2006) A GIS process for delimitating areas potentially endangered by debris flow. Nat Hazards 37:169–189CrossRef Yu FC, Chen CY, Chen TC, Hung FY, Lin SC (2006) A GIS process for delimitating areas potentially endangered by debris flow. Nat Hazards 37:169–189CrossRef
go back to reference Zhan W, Fan X, Huang R, Pei X, Xu Q, Li W (2017) Empirical prediction for travel distance of channelized rock avalanches in the Wenchuan earthquake area. Nat Hazards Earth Syst Sci 17:833–844CrossRef Zhan W, Fan X, Huang R, Pei X, Xu Q, Li W (2017) Empirical prediction for travel distance of channelized rock avalanches in the Wenchuan earthquake area. Nat Hazards Earth Syst Sci 17:833–844CrossRef
go back to reference Zhang S, Zhang LM, Chen H, Yuan Q, Pan H (2013) Changes in runout distances of debris flows over time in the Wenchuan earthquake zone. J Mt Sci 10:281–292CrossRef Zhang S, Zhang LM, Chen H, Yuan Q, Pan H (2013) Changes in runout distances of debris flows over time in the Wenchuan earthquake zone. J Mt Sci 10:281–292CrossRef
go back to reference Zheng H, Shi Z, Kaitna R, Zhao F, de Haas T, Hanley KJ (2023) Control mechanisms of pore-pressure dissipation in debris flows. Eng Geol 317:107076CrossRef Zheng H, Shi Z, Kaitna R, Zhao F, de Haas T, Hanley KJ (2023) Control mechanisms of pore-pressure dissipation in debris flows. Eng Geol 317:107076CrossRef
go back to reference Zhou W, Fang J, Tang C, Yang G (2019) Empirical relationships for the estimation of debris flow runout distances on depositional fans in the Wenchuan earthquake zone. J Hydrol 577:123932CrossRef Zhou W, Fang J, Tang C, Yang G (2019) Empirical relationships for the estimation of debris flow runout distances on depositional fans in the Wenchuan earthquake zone. J Hydrol 577:123932CrossRef
Metadata
Title
Travel distance estimation of landslide-induced debris flows by machine learning method in Nepal Himalaya after the Gorkha earthquake
Authors
Chenchen Qiu
Xueyu Geng
Publication date
01-10-2024
Publisher
Springer Berlin Heidelberg
Published in
Bulletin of Engineering Geology and the Environment / Issue 10/2024
Print ISSN: 1435-9529
Electronic ISSN: 1435-9537
DOI
https://doi.org/10.1007/s10064-024-03883-8

Other articles of this Issue 10/2024

Bulletin of Engineering Geology and the Environment 10/2024 Go to the issue