1 Introduction

Approximately four decades ago, Robbins [45] stated “nothing has been more difficult than evaluating the rock mass characteristics and applying the evaluations to a formula predicting penetration rate”. Despite huge technological developments, the prediction of rock mass characteristics and the rate of penetration of TBMs is still a challenging problem for tunnel engineers. In addition, due to the lack of living space and the increase in population, there has been a construction boom in the underground space to improve the quality of human life [54]. In addition, the growth in the economy has led to enhanced engineering studies that not only result in a significant reduction in transportation time but also aid in developing comfortable transportation choices [37]. In recent decades, mechanized tunnelling techniques, particularly tunnel boring machines (TBMs), have been extensively applied to tunnel construction due to their high excavation rate and low total cost for the excavation of long tunnels [39]. In other words, TBM tunnelling has serious advantages for long tunnels if the geological and geotechnical characterizations of tunnel routes are described correctly and a suitable machine for ground conditions is selected.

A reliable and accurate prediction of the tunnel boring machine (TBM) performance can assist in minimizing the relevant risks of high capital costs and in scheduling tunnelling projects [68, 69]. With respect to TBM drilling in hard rocks, the most important aspect of its operation is the prediction of its rate of penetration (ROP) [49]. However, predicting the tunnel construction duration for long tunnels in complex geological and geotechnical conditions is not an easy task because of high uncertainty. Zhou et al. [68] applied six machine learning methods for the prediction of ROP and found that the comprehensive performance of the particle swarm optimization—extreme gradient boosting hybrid model—is superior to the other five models. Minh et al. [42] used uniaxial compressive strength (UCS), Brazilian tensile strength (BTS), rock brittleness index (BI), the distance between planes of weakness (DPW) and the alpha angle (α) between the tunnel axis and the planes of weakness for predicting ROP. Minh et al. [42] suggested that the fuzzy logic as well as other artificial intelligences can also be used as very good alternatives to predict ROP. Jung et al. [28] predicted the ground conditions ahead of the tunnel face regardless of site conditions considering the operational data of the shield TBM acquired during the tunnel excavation stage. Similarly, Zhang et al. [62] predicted the geological conditions using TBM operational data. Armetti et al. [6] assessed TBM performance data employing various intact rock and rock mass properties. Salimi et al. [48] proposed a brief review of the applications of common rock mass classification systems for the performance prediction of TBMs and the development of a new model that is based on the input parameters of the RMR system for the prediction of TBM performance. Farrokh [14] reviewed and compared the results of several mainstream TBM advance rate-estimating models for hard rock TBMs through the evaluation of their predictive abilities, and he used a database of performance parameters for 17 recent tunnel projects. Additionally, several researchers have used various prediction algorithms, such as artificial neural networks [35, 36, 43, 52, 55, 57, 66, 68, 69], fuzzy or neuro-fuzzy inference systems [1, 21], metaheuristic algorithms [70] and multiple regression [12, 15, 17, 20, 22,23,24,25,26, 33, 56, 57], to estimate TBM performance. Jing et al. [27] suggested a TBM advance rate prediction model considering operation factors. As seen from the brief literature summary, the prediction of TBM performance is one of the most important research subjects among tunnel engineers because this problem has not yet been completely solved. Due to the complexity of geological and geotechnical conditions along tunnel routes, the prediction of ROP is difficult. However, depending on the accumulation of well-documented data and advancements in prediction algorithms, more understandable and applicable estimation equations and models have started to be produced.

In Turkey, depending on the development of railway systems, several railway construction projects are ongoing. One of these projects is Bahce–Nurdag (south of Turkey) twin-tube tunnels, and these tunnels are the longest railway tunnels of Turkey. Excavations started from Nurdag (Gaziantep) with TBM, and one of the tunnels was completed in Bahce (Osmaniye) in 2020. The length of each tube is approximately 10 km. The geological characteristics of the Bahce–Nurdag Railway Tunnels route are extremely complex because the route is located in the active East Anatolian Fault Zone. The most important issue for long tunnels excavated with TBMs in complex geological conditions is the prediction of construction time, and the most important parameter is the rate of penetration (ROP). Consequently, the purpose of the present study is to develop prediction equations and models using the data collected from one of the longest railway tunnels of Turkey (Bahce–Nurdag Railway Tunnel). The length of one tube of the tunnels is approximately 10 km, but the data were collected along 8 km because the other parts of the tunnel were excavated by the NATM method. During the excavation phase, weathering degree and water conditions were observed and measured directly; however, other parameters, such as the Cerchar Abrasivity Index, uniaxial compressive strength and alpha angle, were determined using the borehole and laboratory data. Regression analyses and artificial neural networks were used to analyse the data, and the results were presented and discussed.

2 Geological and geotechnical conditions of tunnel route

The Bahce–Nurdag tunnel route is located at the borders of Osmaniye and Gaziantep in southeastern Turkey (Fig. 1). After the completion of the Bahçe–Nurdag tunnels, the length of the railway between Bahçe and Nurdag will decrease from 32 to 15 km. The region has a dense population, and the total population of Gaziantep, Hatay, Osmaniye and Adana, which will directly benefit from this project, is approximately 6.5 million people. In addition, one of Turkey’s most important industrial organizations is Iskenderun Iron and Steel Plant located in the region. Great economic benefits will be obtained as a result of transportation of the produced steels and other industrial products by rail. For this reason, Bahçe–Nurdag tunnels have a very high importance in terms of both the travel of local dwellings and the transportation of industrial products.

Fig. 1
figure 1

Location map of the Bahce–Nurdag tunnels

Turkey is located in the Alp-Himalayan earthquake zone, and due to this tectonic feature of Turkey, tunnel construction works have several serious geological and geotechnical problems [7, 8]. The Bahçe–Nurdag tunnels are in the active East Anatolian Fault Zone (EAFZ) (Fig. 2). The EAFZ is one of the most seismically active zones of Turkey because it represents a plate boundary extending over 500 km between the Arabian and Anatolian plates [11]. However, the EAFZ has been relatively quiescent in the last century when compared to historical records and has therefore accumulated significant stresses along its length [44]. In particular, the segment near the project area has a high seismic risk, and the possible magnitude of future earthquakes is predicted to be approximately 7.3 by Nalbant et al. [44]. When considering this seismic risk, the most important factor for the project is duration. If a major earthquake occurs during the construction phase, the damage to the tunnel will be serious. The geological map of the tunnel route and its close vicinity is shown in Fig. 3. Along the tunnel route, various types of metamorphic units, such as metasandstone, quartzite, schist and slate, were encountered (Fig. 4). Some parts of the tunnel route include only units, while some parts are formed by metasandstone–slate or metasandstone–slate–quartzite alternance. The longitudinal cross section of the tunnel is shown in Fig. 4. As shown in Fig. 4, the tunnel has serious groundwater and 50 l/s water inflow was measured, but some parts are occasionally dry. The overburden thickness reaches up to 640 m.

Fig. 2
figure 2

taken from [44])

The Bahçe–Nurdag tunnel location on the seismotectonic map of the East Anatolian Fault Zone (the map was

Fig. 3
figure 3

Geological map of the tunnel route and its close vicinity [37]

Fig. 4
figure 4

Longitudinal cross section of the Bahce–Nurdag Tunnel (section includes the TBM part) (modified after Fugro Sial Inc. [18])

During the geological and geotechnical investigation phase of the project, a total of 15 geotechnical boreholes were drilled by Fugro Sial Inc. [18]. The depths of these boreholes vary between 40 and 435 m. The average geotechnical data obtained from these boreholes and laboratory tests are summarized in Table 1.

Table 1 Some average geotechnical parameters of the lithological units (compiled from Fugro Sial Inc. [18])

3 TBM characteristics and data identification

The customized single-shield TBM (Fig. 5) with a diameter of 8 m was designed with Difficult Ground Solutions (DGS) by Robbins (Robbins Inc., 2020). The TBM has 10 motors, and each motor has 330 kW. These motors produce 14,453 kNm torque. The excavated material is removed from the tunnel with the conveyor belt system (Fig. 6). Along 8 km, 402.000 m3 metamorphic rock masses were excavated by the TBM (Fig. 5) and transported by the conveyor belt system (Fig. 6).

Fig. 5
figure 5

The TBM used in Bahce–Nurdag tunnel

Fig. 6
figure 6

The conveyor belt system of the TBM

As mentioned previously, several prediction models have been developed, and various TBMs and intact rock and rock mass parameters have been used. It is obvious that the parameters used in the ROP prediction model should be determined easily and are reliable. One of the commonly used parameters is the α angle. α is expressed as the angle between the TBM axis and the planes; generally, the maximum TBM penetration rate occurs when this angle is approximately 60° [66]. Vergala and Saroglou [53] proposed a new field penetration index for mixed-face ground conditions (MFPI), and they found that increasing weighted rock mass rating, RMRm, resulted in an increase in the mixed-face field penetrating index. However, Salimi et al. [47] mentioned that the boreability of rock decreases with the increase in UCS. Afradi et al. [3] used a comprehensive database including uniaxial compressive strength, Brazilian tensile strength, RQD, cohesion, elasticity modulus, Poisson’s ratio, density, joint angle and joint spacing as input parameters for estimating penetration rate. Mahdevari et al. [41] employed uniaxial compressive strength, tensile strength, brittleness index, distance between the plane of weakness, alpha angle and machine parameters when developing the TBM performance model. The parametric study and sensitivity analysis of the common prediction models in relation to input variables indicate that uniaxial compressive strength is the most influential parameter across all models [16]. Similar results on uniaxial compressive strength were obtained by Torabi et al. [51].

Prediction of the ROP before the excavation phase is important because the main purpose of these models is to estimate the TBM’s completion time. For this reason, it is the development of prediction models with the geological–geotechnical parameters affecting the TBM excavation. However, it is also important that the parameters to be used in the prediction model can be obtained from the geological cross section and the boreholes. After the tunnel is completed with TBM, many TBM parameters are obtained. However, it is not possible to know some of these parameters such as thrust, torque and energy consumption [59] before the excavation. Despite this view, some operational parameters such as torque are used for TBM performance estimation as well as geological–geotechnical parameters. Zhao et al. [67] proposed a TBM performance prediction method based on Mixed-face Torque Penetration Index and torque capacity. Consequently, it was preferred to use the parameters obtained before the excavation in the models developed in this study. In other words, after the geological cross section was prepared, the data representing each unit were used as the input parameter using drilling and laboratory data. Consequently, in the present study, α, uniaxial compressive strength (UCS), weathering degree (W), water conditions (WaterInflow) and Cerchar Abrasivity Index (CAI) are used to predict ROP. The statistical summaries of the inputs and the output are given in Table 2. The TBM excavated an 8000 m tunnel, and each parameter was determined at each 1.5 m advancement; hence, the database includes 5334 cases. The ROP values vary between 5.5 and 114.5 mm/min depending on the geological and geotechnical conditions of the tunnel route.

Table 2 Statistical summaries of the input and output parameters

The “α” angle is utilized for quantifying the influence of discontinuity geometry on tunnel boring machine performance, and the following equation is used for the calculation of “α” angle [10]:

$$\alpha = \arcsin (\sin \alpha _{f} \times {\text{ }}\sin (\alpha _{{\text{t}}} - \alpha _{{\text{s}}} ))$$

where αf = dip of discontinuity (degree), αs = strike of discontinuity (degree), αt = direction of tunnel (degree).

4 Multiple regression analyses

If the dependent variable is controlled for by two or more independent variables, multiple regression analysis is used. In geotechnical practice, multiple regression analyses have been widely used. For example, multiple regression analyses were used to estimate the UCS [19, 58, 60, 71], predict rock mass permeability [30, 32], predict the deformation modulus of rock masses [4, 31] and predict the TBM performance [22, 57]. In this study, a series of simple regression analyses are performed before multiple regression analyses to check the multicollinearity. The coefficients of correlation of the simple regression analyses are summarized in Table 3. The general formula of the correlation coefficient (r) is given as follows:

$$r = s_{{xy}} /s_{x} s_{y}$$

where sx and sy are the sample standard deviations and sxy is the sample covariance.

Table 3 Coefficient of correlations (R) obtained from the simple regression analyses

As seen from Table 2, there is a strong correlation between the CAI and UCS. Similarly, several authors (i.e. [13, 29, 34, 65]) investigated the relation between CAI and UCS and they found meaningful correlations between these two parameters. Hence, it is impossible to use both UCS and CAI in the same model. Among the other parameters, there is either a very weak or no relationship. The relationships between ROP and the independent variables are almost linear. For this reason, linear multiple regression analyses and nonlinear multiple regression analyses were performed using IBM SPSS Statistics package.

During the multiple regression analyses, two models are run. The first includes WatInflow, W, α and UCS as independent variables, while the second includes WatInflow, W, α and CAI. For the first model, the coefficients of correlations of cross-correlations between the measured and predicted ROP values are obtained as 0.59 for both nonlinear and linear multiple regressions. For this reason, the linear regression equation (Eq. 1) is preferred, and the cross-correlation graph is shown in Fig. 7a. Additionally, the correlation equation of the second model is given in Eq. (2), and the cross-correlation graph of the second model is shown in Fig. 7b, and the coefficient of correlation for the second model is found to be 0.56.

$${\text{ROP}} = \left( {0.011{\text{UCS}}} \right) + \left( {0.164a} \right) - \left( {7.2W} \right) + \left( {0.56{\text{WatInflow}}} \right) + 53.5$$
(1)
$${\text{ROP}} = \left( {0.093{\text{CAI}}} \right) + \left( {0.166a} \right) - \left( {7.3W} \right) + \left( {0.57{\text{WatInflow}}} \right) + 54.5$$
(2)

where ROP = rate of penetration (mm/min), UCS = uniaxial compressive strength (MPa), α = alpha angle (degree), WatInflow = water inflow (l/s), CAI = Cerchar Abrasivity Index.

Fig. 7
figure 7

Cross-correlation graphs between predicted and measured ROPs; a model 1 and b model 2

The equations obtained from the multiple regression analyses are statistically meaningful; however, the equations for extreme values yield lower values. The model including UCS is slightly better than that including CAI. In general, the equations provide a good prediction of the average values. In addition, both equations are similar because there is a strong relationship between the CAI and UCS.

5 Artificial neural networks

Accurately predicting the performance of a tunnel boring machine (TBM) is important for safe and efficient tunnelling, and hence, the application of machine learning algorithms to TBM performance prediction creates several challenges [63]. Similarly, Armaghani et al. [5] applied several optimization techniques for estimating the TBM advance rate in granitic rocks. In addition to traditional methods, some intelligent methods, such as artificial neural networks (ANNs), have been applied to various problems in the tunnel domain in recent years [54]. In the present study, two ANN models are developed using a large database collected over 5 years from 8000 m tunnel excavation. The success of an ANN model depends on the size of the database. In this study, a database including 5334 cases was used during the ANN modelling performed using MATLAB R2020a software. A total of 3734 cases were used in the training stage, 800 cases were used in testing, and 800 cases were used in validation. During the training stage, the Levenberg–Marquardt learning algorithm is employed. Yu and Wilamowski [61] stated that “the Levenberg–Marquardt algorithm [38, 40] provides a numerical solution to the problem of minimizing a nonlinear function. It is fast and has stable convergence. In the artificial neural networks field, this algorithm is suitable for training small- and medium-sized problems”. The steepest descent method and the Gauss–Newton algorithm are blended by the Levenberg–Marquardt algorithm [61]. The general structures of the ANN models constructed in the study are shown in Fig. 8.

Fig. 8
figure 8

General structure of the ANN models constructed in the study

Model 1 completes the learning stage at 142 iterations, while model 2 reaches the minimum SME value at 217 iterations. The cross-correlation between the measured and predicted ROP values for model 1 is shown in Fig. 9. According to the cross-correlation results, the coefficients of correlation of training, testing, validation and all cases for model 1 are 0.84, 0.84, 0.83 and 0.84, respectively. These results show that the model including UCS, water conditions, weathering degree and α angle as input shows a strong prediction capacity. In addition, the coefficients of correlations of the training, testing and validation data are almost the same, which shows that the generalization capacity of the model developed in this study is successful.

Fig. 9
figure 9

Cross-correlations between the predicted (output) and the measured (target) ROP values for model 1

The cross-correlation between the measured and predicted ROP values for model 2 is shown in Fig. 10. According to the cross-correlation results, the coefficients of correlation of training, testing, validation and all cases for model 2 are 0.85, 0.83, 0.84 and 0.84, respectively. When compared to model 1, the performance of model 2 is slightly higher than that of model 1; however, both models yield meaningful and promising results.

Fig. 10
figure 10

Cross-correlations between the predicted (output) and the measured (target) ROP values for model 2

6 Results and discussion

One of the essential tasks in the excavation of tunnels with TBMs is the reliable estimation of the performance needed for planning, cost control and other decision-making regarding the feasibility of tunnelling projects [2]. However, according to results of the extensive review on the literature performed by Samaei et al. [49], there is no comprehensive agreement on the quantitative or qualitative influence of various variables on the TBM performance assessment, but the degree of accuracy in its prediction has been improved in recent years through using various algorithms such as ANN, support vector machine, fuzzy and neuro-fuzzy. For this reason, reliable prediction models for ROP have been attractive subjects for tunnel engineers. A recent study performed by Bardhan et al. [9] on prediction penetration rate discussed the existing models for prediction of penetration rate. The inputs of the models developed by Bardhan et al. [9] are uniaxial compressive strength, rock quality designation and distance between planes of weakness. In the present study, extensive observations were performed for 5 years to collect the data, and a database containing a large number of cases and based on detailed observations was formed. By using this database, simple and multiple regression analyses and ANN modelling to predict the ROP were performed. It is important to select input parameters that are easy to determine and reliable. In addition, parameters characterizing the geological and geotechnical conditions of the tunnel route and directly affecting the TBM are taken into consideration. In the first stage of the analyses, simple regression analyses are performed. According to the simple regression analysis results, no strong correlation between ROP and the independent parameters was obtained. Additionally, no meaningful correlation among the independent parameters considered in the study was found. However, only a strong correlation between UCS and CAI was obtained. For this reason, UCS and CAI are not used in the same multiple regression and ANN models to eliminate multicollinearity. As a result of this assessment, two different multiple regression analyses are performed. Both nonlinear and linear multiple regression analyses are performed. The coefficients of the correlations are almost the same; hence, linear multiple regression analyses are selected for practical use. The coefficients of correlation of the multiple regression analyses are almost the same but not strong. However, considering the number of cases, multiple regression equations can be used for ROP predictions. In the present study, ANN models were developed to predict ROP. The same parameters as the multiple regression equations are used, and two ANN models are constructed. Both ANN models constructed in the study produced outperforming results. Similarly, Bardhan et al. [9] stated that “TBM’s ROP estimation using theoretical and empirical models gives relatively low accuracy, and hence new methods need to be developed”. In addition, the soft computing algorithms give more successful results than the regression models [2, 9, 35, 50, 57, 65]. Finally, the models developed in the present study can be used to estimate ROP for metamorphic rocks and deep tunnels using easily obtained geological and geotechnical parameters before the excavation phase. Another important result is that UCS and CAI have similar effects on TBM advancement in metamorphic rock media.

7 Conclusions

When geological uncertainties are added to this complex system, predicting the performance of a TBM in a long tunnel sometimes becomes extremely difficult. Considering these uncertainties, a database containing a high number of cases is constructed in the study. The data employed in the study are new because one tube of the Bahce–Nurdag tunnels was completed successfully in 2020. Along the tunnel route, various types of metamorphic rocks were encountered, and all assessments performed in the study are valid for metamorphic rocks. In addition, the tunnel is deep, and the overburden reaches 640 m. Consequently, the conclusions obtained from the study can be drawn as follows:

  1. (a)

    The reliability of a prediction model depends on the quality and quantity of the data. The database used in the study is formed by careful measurements and observations during tunnel excavation. In this study, 5334 cases were used to construct the prediction models.

  2. (b)

    The input or independent parameters should be obtained easily. This is another important issue for an ROP prediction model. For this reason, in the study, UCS, CAI, α angle, weathering degree and water conditions are selected as inputs or independent variables. These variables directly affect TBM performance because each of the variables has a physical relation with TBM advancement. However, no meaningful relation between ROP and independent variables such as UCS, CAI, α angle, weathering degree and water conditions was found. However, a weak relationship between ROP and weathering degree is obtained, but it is insufficient to use ROP prediction. This result revealed that TBM advancement cannot be explained by a single variable. For this reason, multiple regression and ANN models with multiple inputs are developed.

  3. (c)

    The construction of tunnels with linear engineering structures is an extremely complex process. Therefore, it is possible to use highly sophisticated algorithms and a large number of input parameters to estimate the ROP. However, such models are difficult to use in practice. For this reason, attention has been devoted to developing models that are as easy as possible and to using easily obtainable input parameters so that the models developed in this study can be used in other tunnels. However, the models developed in this study have more generalization capacity than the more complex models.

  4. (d)

    The developed multiple regression equations have a moderate prediction capacity. However, considering the number of cases and the characteristics of the independent variables, they can be used for preliminary investigation stages. In contrast, both ANN models reveal a high prediction capacity. Before deep tunnel construction in a metamorphic rock medium, the ANN models constructed herein are reliable and can be used.

Consequently, ROP prediction is an important topic, and it is open to development depending on new and reliable data. Depending on the developments in prediction algorithms, more reliable and high-performance models will be developed in the near future. Therefore, studies on TBM performance prediction will continue to increase.