## Introduction

^{5}m

^{3}or even greater due to the entrainment of large quantities of materials during the flowing process (Crosta et al. 2003), finally leading to a long travel distance and causing severe losses of properties and human lives (Dahlquist and West 2019; Qiu et al. 2024b. Therefore, estimating the distance of debris flows in the aftermath of earthquakes on a regional scale is of paramount importance. It helps in identifying high-risk areas and developing effective mitigation strategies (Cascini et al. 2014; Corominas et al. 2014; Paudel et al. 2020; Zhang et al. 2013; Zhou et al. 2019).

## Study area

## 3. Methodology

### 3.1 Selection of disposing factors

_{L}), the drop height between the centre of the source area and the endpoint of movement mass (H), the mean gradient of the travelling path (J), the mean curvature of travelling path (C) and the normalized difference vegetation index (NDVI).

_{L}and H serve as indicators of the potential energy stored within the failure mass, offering insights into its subsequent movement distance (Roback et al. 2018; Zhan et al. 2017; Puglisi et al. 2015; (Qiu et al. 2024c). A greater sediment volume normally could cause a longer runout distance (Legros et al. 2002; Guo et al. 2016; Falconi et al. 2023). As for the mean gradient of the travelling path (J), this factor is proven to present a strong correlation with the travel distance (Rickenmann 1999). Notably, our calculation of J diverges from Rickenmann (1999), J is calculated using the formula proposed by IMHE (1994):

_{1}, E

_{2}, …, E

_{i−1}, E

_{i}are the elevations of each break point in the movement path (m). Elevation was obtained from a 12.5 m digital elevation model (DEM (downloaded from https://search.asf.alaska.edu/#/)). L

_{1}, L

_{2}, …, L

_{i−1}, L

_{i}are the lengths of each section of the movement path (m). n is the number of path sections. E

_{0}is the elevation of the endpoint of mass movement (m), and L is the length of the travel path (m). The divided sections are presented in Fig. 6.

### 3.2 A three-step analysis

_{i}

^{2}denotes the coefficient of determination in the regression model when the dependent variable is X

_{i}, while the other input data are independent variables. Following the first two-step analysis, the importance of each variable in contributing to the travel distance can be initially assessed. However, further data processing remains crucial due to intercorrelations among these factors. Additionally, the useless information within the data should also be removed since it can increase the analysis difficulties (Chaib et al. 2015). Therefore, PCA was introduced to reduce the data dimension and eliminate the relevance between factors based on origin software. This method seeks to generate new indices, termed ‘principal components’, which encapsulate the most essential data information. The fundamental PCA process comprises several steps: (1) Normalize the multi-dimension data matrix; (2) calculate the eigenvalues and eigenvectors of this matrix; (3) arrange the eigenvalues and eigenvectors in descending order; (4) Select the first K values based on the accumulative contributions. Finally, (5) a new k-dimensional matrix can be generated through dimension reduction. The whole process can be described as:

_{1}, V

_{2}, …, V

_{m}), contributes to the generation of the principal components V:

_{1}, V

_{2}, …, V

_{m}. In order to further enhance the input stability and difficulties of data processing ability for the model, the generated three principal components are normalized into the range of [0.01, 0.99] based on the equation:

_{nor}represents the normalized data, which came from x. U and L are the upper and lower normalization bounds, respectively.

### 3.3 Development of a machine learning model

### 3.4 Model assessment

_{ipre}represents the estimation results, and y

_{i}is the actual value. n is the number of estimation values. A better model is indicated if the calculated results of RMSE, MAE, and MAPE are closer to 0. Moreover, to further reveal the contributions of each variable in estimating the travelling distance, one variable is removed from model development at a time to generate five estimation models. Then the RMSEs and MAEs of each model are calculated, respectively. Meanwhile, the ratios of RMSE and MAE of the models are also calculated, as the abnormal values may cause the instability of output results. So, this ratio can reflect the model’s stability.

## 4. Result analysis

### 4.1 Determination of input variables

_{L}). As for the other variables, J and C, a stronger correlation is observed between J and L, reaching 0.545. C displays a correlation value of 0.401 with L. Conversely, NDVI demonstrates a weak correlation with travel distance, leading to its exclusion from the model development process.

_{L}, H, J, and C. The calculated results, as presented in Table 1, clearly indicate the absence of multi-collinearity among these variables, as all TOL values exceed the threshold of 0.1. Furthermore, no VIF values exceed 100, affirming the suitability of all four variables for inclusion in the model development process.

Factors | Collinearity indexes | |
---|---|---|

TOL | VIF | |

Volume of failure mass (V _{L}) | 0.743 | 1.346 |

Height difference between center of source area and end point of mass movement (H) | 0.528 | 1.894 |

Mean gradient of travelling path (J) | 0.692 | 1.445 |

Mean curvature of travelling path (C) | 0.866 | 1.155 |

### 4.2. Estimation of travel distance and evaluation of model performance

### 4.3 Sensitivity analysis

_{L}+H+J + C), Model 2 (V

_{L}+H+J), Model 3 (V

_{L}+H+C), Model 4 (H + J + C), Model 5 (V

_{L}+H), Model 6 (H + J), Model 7 (H + C), Model 8 (V

_{L}+J), Model 9 (V

_{L}+C), Model 10 (J + C), Model 11 (H), Model 12 (V

_{L}), Model 13 (J), and Model 14 (C). After that, we test the estimation accuracy of the 15 models based on the RMSE, MAE, and MAPE indices. Before conducting sensitivity analysis, we plotted the estimation results of PCA model and Model 1 to test the efficiency of PCA method in removing noise information and therefore increasing estimation accuracy (Fig. 9). As indicated in Fig. 9, PCA model performs better than Model 1 since the estimation results of Model 1 exhibit the greater divergence from the measured values.

_{L}and H. A significant percentage reduction of MAPE can be found when H factor in Model 6 was replaced by V

_{L}(Model 8), reaching 79.2%. Model 10 exhibits the smallest MAPE value due to a combination of J and C. Furthermore, The MAPE value ranges from 45.8 to 79.3% if only H, V

_{L}, J, and C were utilized for model development, respectively. This underscores the pivotal roles of V

_{L}and H as the main control factors, determining the potential energy and the distance the failure mass can travel (Lo

## 5. Comparison with existing empirical equations

Source | Equations | Dataset |
---|---|---|

(Rickenmann 1999) |
\(L=1.9{M^{0.16}}H_{e}^{{0.83}}\) (11)
| Italy, Japan, China, Swiss, U.S.A, Columbia |

(Lorente et al. 2003) |
\(L=7.13{\left( {M{H_e}} \right)^{0.271}}\) (12)
| Central Spanish Pyrenees |

(Hürlimann et al. 2015) |
\(L=7.48{V^{0.45}}\) (13)
| Switzerland |

## 6. Discussion and limitations

## 7. Conclusion

_{L}, H, J, C, and NDVI. After that, a correlation analysis is conducted to analyze the correlations between each variable and travel distance. Then, the multi-collinearities among variables are investigated to remove NDVI because it presents a weak correlation with travel distance. Furthermore, the remaining four variables are used to generate principal components, PC1, PC2 and PC3, to reduce the dimension of input data and ensure model stability.

_{L}+H+J + C), Model 2 (V

_{L}+H+J), Model 3 (V

_{L}+H+C), Model 4 (H + J + C), Model 5 (V

_{L}+H), Model 6 (H + J), Model 7 (H + C), Model 8 (V

_{L}+J), Model 9 (V

_{L}+C), Model 10 (J + C), Model 11 (H), Model 12 (V

_{L}), Model 13 (J), and Model 14 (C). The performances of these models were evaluated using the three indexes again. The results show the necessity of incorporating all four factors into model development if high accuracy is expected. The proposed factor combination in our studies is suitable for estimating travel distance for debris flows after the earthquake. Finally, we compared the estimation model with existing empirical equations. Our proposed model performs the best because the estimation results are the closest to the actual values. Therefore, this model can effectively estimate the travel distance of debris flows after the earthquake, but slight fluctuations of the estimation accuracy may be inevitable due to the different topographic conditions if this model is applied to other areas.