Introduction
3D geological modeling, which essentially relies on boreholes, shallow sections,
in-situ test data, and geological outcrops, is vital for the interpretation of geological structures (Wang et al.
2020; Zhou et al.
2019), serving as a basis for subsequent mechanical analysis (Chen et al.
2023), safety assessment (Chen et al.
2021), and construction planning of engineering construction (Aghamolaie et al.
2019; Schaaf and Bond
2019). However, due to the complexity of geological conditions in most projects (Chen et al.
2022a), especially large-scale hydraulic engineering projects, and the limitation of the investigation, the process of geological interpretation has great uncertainty. The modeling of ground and underground conditions is a challenging issue, especially in the case of complex geometries and spatially variable geotechnical properties (Petrone
2023; Antonielli
2023) Hence, it is necessary to use the limited primary data to predict the underground space and provide an immediate preliminary geological model, especially for the dam site. The exploration drilling plan is further guided by the uncertainty analysis and then the refined implicit characteristics are reconstructed. Explicit modeling mainly deals with deterministic methods for large-scale structures (Tian et al.
2023), and heavily relies on manual interpretation which is prone to subjective errors. Although scholars have proposed the semi-automated method (Song et al.
2019), it also has difficulties in assessing uncertainty (Ouyang et al.
2023), which is essential in guiding the investigation. By contrast, implicit modeling has the characteristics of fast modeling speed, less subjective interference, and a strong mathematical foundation. Especially in the current prosperity of data science, implicit modeling has gradually received more and more attention and has played an important role in uncertainty assessment investigation, survey work guidance, and decision-making (Guo et al.
2022).
Implicit modeling essentially employs the spatial interpolation method (Lorensen and Cline
1987), fitting the space surface function to the sampled data, resulting in visualization only in the final output. Implicit modeling generates a scalar field for the entire geological study area (Guo et al.
2021; Lipus and Guid
2005; Scalzo et al.
2022; Zhong et al.
2021), and common reconstruction techniques include, radial basis function (RBF) method (Macêdo et al.
2011), moving least squares (MLS) (Wan and Li
2022), and surface reconstruction approach based on the stochastic velocity fields (Yang et al.
2019). Recently, machine learning has been integrated into implicit modeling to quickly generate more accurate models. Guo et al. (
2021) proposed a borehole-based implicit modeling approach that offers new possibilities for controllable 3D automated modeling. Shi et al. (
2021) employed the marching cubes (MC) method to reconstruct the interpolating implicit function surface. A combination of implicit modeling and machine learning is capable of effectively enhancing the overall modeling accuracy by leaving the implicit surface-solving process to the algorithm (Hou et al.
2023; Wang et al.
2023; Yang et al.
2022). However, there are still several challenges associated with implicit modeling: 1) Assessing the limited model uncertainty; 2) Difficulty in providing accurate implicit specifications for non-renewable models. Commonly, geological conditions with strong anisotropy and sparse survey data make it difficult for experienced geological experts to infer definitive geological structure, uncertainty being the most acute challenge (Lindsay et al.
2012; Madsen et al.
2022; Ouyang et al.
2023; Zhao et al.
2023a). Due to formation complexity, measurement uncertainty, data scatter and heterogeneity (Shi et al.
2021), surface construction uncertainty (Schaaf and Bond
2019) and random error of sampling, uncertainty exists in most geologic model construction areas. Conventional experience is able to solve the formation and judgment of the surface geological structure, but not the deep formation and large-scale evaluation (Wang et al.
2022). Commonly, deterministic models are not appropriate for examining the uncertainty of geological structure and hinder model renewal evaluations. By making use of the implicit modeling method, the uncertainty expression (Grose et al.
2018; Røe et al.
2014) is appropriate based on geographic statistics. With the aid of quantitative simulations, it is possible to precisely locate the formation zone with a high degree of certainty. Our assessment of the level of uncertainty enables us to recommend supplementary boreholes, ensuring the most comprehensive and accurate results.
Many machine learning algorithms, such as neural networks (Shi and Wang
2022,
2021b; Titus et al.
2022), mixed density networks (Han et al.
2022), random forest, deep forest, and SVM are utilized to enhance accuracy in all types of lithology identification, geological tectonic prediction, and geological modeling. The formation, mineral prediction, and identification have been extensively implemented in geological disciplines (Deng et al.
2022; Harris et al.
2023; Ren et al.
2022; Zhao et al.
2023b). While machine learning is capable of precisely identifying stratum distributions, there are several challenges that must be addressed: 1) Dependency on large training datasets; 2) Difficulty in generalizing a large area via a single model. This means that the machine learning algorithm used in the implicit model requires more processing of the data to enhance the modeling accuracy, especially the refined features.
Considering the above engineering problems, such as frequent manual interaction, difficult model updating, sparse data, incomplete detail refinement, and difficult uncertainty assessment in geological modeling, an implicit modeling and uncertainty analysis method based on ensemble modeling is proposed and applied to the engineering geological exploration, particularly for dam construction. The chief structure of the present paper is as follows: Section 2 introduces the detailed workflow of the implicit modeling approach and the main utilized research methods. Section 3 introduces the real case study of Yunnan Province, China. Section 4 outlines the prediction of the model and discusses the implicit modeling rules, triangular prism block method, and uncertainty analysis in this process. Section 5 presents the main obtained results.
Discussion
Implicit modeling with machine learning
The discrete input of implicit modeling can be suitably combined with machine learning to realize automated modeling. After simple processing, the drilling data can be converted into a training set. The discretized output can be visualized quickly, making it easy for decision-makers to readjust the model. A combination of implicit modeling and machine learning essentially transforms a series of tedious processes of determining topological relationships and surface construction into classification algorithm selection and hyperparameter adjustment. This paper proposes a set of geological implicit modeling and uncertainty analysis methodologies based on the triangular prism blocks method. The accuracies of the seven popular algorithms touch approximately 0.9, with an R^{2} value of more than 0.95 between those of the predicted surface and the actual interface. This reveals that the overall model constructed using the implicit method meets the accuracy requirements of general engineering. Additionally, the local model fits well with the surface model of the stratigraphic interface, resulting in smooth surfaces.
Refined characterization of geological entity with divide-and-conquer tactic
Methods of dividing the region
The discrete points’ processing is also a major research problem that directly affects the classifier's prediction results. The divide-and-conquer tactic serves as a crucial preprocessing in implicit modeling. Here, we supplement and compare the divide-and-conquer method with different regional segmentation methods. Through the same process as Table
11, we get the results of Table 18. Several classification algorithms show different performance in different segmentation methods, but the Delaunay-triangular prism shows remarkable performance. The Delaunay-triangular prism presented here essentially employs the characteristics of the triangular Delaunay mesh and is coordinated with the corresponding classification algorithms to refine the local details. This strategy allows for capturing detailed formation situations (Ji et al.
2023), and different sub-blocks can be processed via various approaches. The effects of different sampling point patterns are different in classifiers. The divide-and-conquer tactic is capable of reducing the data volume and classification dimension and avoiding the shortcomings of the classification algorithm. Hence, the algorithm is able to readily identify the optimal classification in the feature space, and the simple and repetitive operation ensures both the automaticity and accuracy of the modeling.
Table 11
The performances of different methods of dividing the region, dam site area with 149 boreholes
Accuracy | 0.973 | 0.906 | 0.868 |
R
^{2}
| 0.9965 | 0.9535 | 0.9464 |
RMSE (m)
| 5.016 | 18.301 | 18.470 |
Mean information entropy | 0.096 | 0.225 | 0.256 |
Modeling confidence | 98.8% | 96.4% | 95.7% |
By employing the EMDCT, the accuracy of the algorithm is substantially improved in the region of sparse data (the overall accuracy is improved by 0.013). Breakthroughs can be made based on the effect of better modeling effect in data-intensive areas (the accuracy of the dam site area is enhanced by 0.011, the correlation coefficient of each interface increases by an average of 0.002, and the
RMSE reduces by approximately 1 m). The average layer error of EMDCT compared to the general explicit modeling method improved by 6.8 m, R
^{2} improved by 0.016 (see Tables
5 and
6). The modeling effect of EMDCT is better than that of traditional methods, from our supplementary experiments. Although not all classifiers are perfect for each data situation and geological structure, using appropriate algorithms in different blocks can result in achieving better results. From the perspective of uncertainty, we calculate the information entropy of two models using SVM alone and EMDCT. The average information entropy of the former model is 0.119 (corresponding to 98.4% confidence), and that of the latter is 0.096 (corresponding to 98.8% confidence), which indicates that the prediction of the model by EMDCT is more stable. EMDCT not only has improved accuracy in the model, but also has higher modeling confidence.
Uncertainties in the geological model
When dealing with sparse data, complex structures, and random sampling, uncertainty quantification is necessary to assess overall geological conditions. The classification of voxel classes is the certainty expression of implicit modeling; in contrast, the probability of each stratum and the entropy information of each voxel are the uncertainty expression of implicit modeling. When we assume that the geological space resembles a binary information entropy system, this expression can be roughly matched to the confidence level. Through cloud maps and confidence levels, people can intuitively grasp the understanding of geological conditions.
Uncertainty analysis provides two advantages for implicit geological modeling: (i) Enabling the geological model to accommodate different possibilities; for instance, in Fig.
14(a), the interface is almost confined to this light blue space at the dam location, but it can occur anywhere in this narrow space, meaning many possibilities for the implicit renewable geological model in the form of scatter; (ii) Guiding geological discoveries; actually, uncertainty analysis assistances in locating new boreholes that effectively augment the implicit modeling algorithm, as it reflects the information that is expected to be added in the algorithm. Boreholes around the dam site are very few. Without uncertainty analysis, adding boreholes randomly, and the overall working area accuracy rate is only 0.951. By adding boreholes based on uncertainty analysis, the accuracy of the whole working area reached 0.962 (see Table
10, 0.011 higher), and the visualization effect of the formation structure is significantly closer to the actual situation than without uncertainty analysis (see Fig.
17), which can reduce the quantity of about 10 boreholes, equivalently.
Limitations
The examples throughout this paper are real engineering cases with good results in hydraulic engineering. This case has the following characteristics to fit into this process: (i) The distribution of boreholes is uneven, and in some areas is very sparse. (ii) The layered structure is obvious. (iii) The stratigraphic scale does not cross orders of magnitude.
Given the scalability and applicability of the method, this section analyzes the limitations of the workflow based on the address characteristics of engineering cases. The method in this paper has limitations in dealing with different practical cases. (i) Data fusion needs to be studied. Borehole is the most commonly used modeling data in geological engineering, which has been proved by most engineering practices. Profiles, geological exposure lines, physical data, etc., can also be used, and many geological modeling methods use the fusion of multi-source data. The proposed approach provides valuable insights for data fusion. The study used only borehole data, not profiles or other geological data. According to our proposed method, the profile data can also be included by the sampling method, which can be processed into "location(x, y, z), attribute(S)" sample points for machine learning. After the boundaries on the profile are interpreted based on their spatial positions, they are divided into sampling points and added to the training set. The integration of multi-source data can strengthen the present approach in the near future. (ii) The modeling effect of structural lens needs to be improved. Structural lenses, denoting solids characterized by a thick middle and thin periphery, typically signify compressive stress-induced structural fracture zones. These features, although smaller in size relative to formation structures, possess complex shapes. When the borehole does not contain any lens information totally, the proposed method cannot infer it, even if the lens exists between the boreholes. When the borehole contains less of it, we want to accurately restore it by implicit modeling method, which involves how determining the voxel size of the grid model. In this context, detailed structures require smaller voxels to represent the boundary. If a finer grid is selected, more voxels are needed to form the grid model and uncertainty model for the same area of work, and the amount of computation will increase exponentially. We try a site-specific hierarchical grid approach in the future.
Conclusions
The 3D geological model provides the basis for the subsequent analysis of the engineering construction of the dam site. In view of current geological modeling problems, such as low local accuracy, difficulty in quantifying uncertainties, and difficulty in guiding model updates, a geological implicit modeling and uncertainty analysis approach is proposed in this study. The contributions include:
(i).
Implicit modeling with machine learning. A combination of implicit modeling and machine learning is classification algorithm selection and hyperparameter adjustment, which is more convenient than reconstructing the potential functions.
(ii).
Application of the EMDCT. EMDCT is able to readily identify the optimal classification in the feature space, which can enhance the reproduction effect of model details.
(iii).
Uncertainty analysis. It quantifies the uncertainty of the whole geological model and guides the further development of geological exploration and model updating. The implementation of uncertainty analysis can reduce the number of boreholes by 10 in this study approximately.
The implicit modeling via machine learning techniques is a relatively novel approach in this field. The EMDCT and uncertainty analysis are two critical processes aimed at improving modeling effectiveness. In this paper, data sets consisting of real boreholes and sampling points within the area are collected by a divide-and-conquer tactic. Commonly, the overall predictive accuracy of the EMDCT, which refines implicit features, is higher. Specifically, the overall accuracy of the working area is 0.922 and the accuracy of the dam site is obtained as 0.973. The EMDCT-based analysis indicates less sensitivity to the number and distribution of boreholes and generally exhibits robustness. Uncertainty assessment is appropriately conducted using information entropy, and a correlation test of the contingency table is adopted to establish a vital relationship between the size of uncertainty and the accuracy of formation prediction. The T-test confirms that the given supplementary borehole is capable of noticeably improving the modeling effect according to the uncertainty assessment, and the significance level reaches 0.005. Based on the results of uncertainty analysis, the overall accuracy of the working area is 0.962 and the accuracy of the dam site is 0.974.
The study used only borehole data, not profiles or other geological data. According to our proposed method, in fact, the profile data can also be included by the sampling method. After the boundaries on the profile are interpreted based on their spatial positions, they are equally divided into sampling points and added to the training set. The integration of multi-source data is able to strengthen the present approach in the near future. The proposed approach, combining the background of geological implicit modeling with the triangulation ensemble model and uncertainty analysis, provides valuable insights for the rapid generation of models, borehole layout, and uncertainty assessment in the early stages of hydraulic engineering geological explorations. The inspiration to engineering practitioners is as follows. (i) The main significance of this research is that it can help engineers understand geological conditions more deeply and guide geological exploration work. Engineers can clearly locate areas with complex geological structures through uncertainty analysis, so as to find the space where most worthy of investigation, rationally arrange the following investigation work, and improve productivity. For example, in the example given in Section 3.5.3, our method can reduce the effort of 10 boreholes (Table
9). Through our method, the accuracy of the whole working area can be improved from 0.909 to 0.962 (Table
10). The modeling process of the divide-and-conquer method and implicit modeling is beneficial to the uncertainty analysis and the layout of new boreholes. In the preliminary planning stage, the general geological structure of the study area can be obtained with few boreholes. (ii) The method takes the uncertainty information of the model into account. The general relationship established between information entropy and confidence allows engineers to have a deeper understanding of the grid model. The uncertainty analysis method in this paper is not only an evaluation of the confidence level of the modeling effect but also a new expression of geological uncertainty. Furthermore, the reasonable integration of uncertainty is conducive to the quantitative evaluation of the reliability of the calculation results in the numerical simulation of geological safety and stability. (iii) Implicit modeling has an inspiration for geological engineers and is a novel workflow for forming geological models automatically and finely.