In this study, a dynamic load-balancing (DLB) technique based on the sampling method is developed for MPMs using higher-order B-spline basis functions for parallel MPI calculations based on domain decomposition, in order to perform large-scale, long-duration landslide simulations in realistic computation time. Higher-order B-spline basis functions use a range of influence across cells compared to general basis functions, but this DLB technique dynamically adjusts the size of the computational domain according to the material point distribution, so that the material points are almost equally distributed across all cores. This allows the load bias between cores to be mitigated and the advantages of parallel computation to be fully exploited. Specifically, the novel contribution of this study is that the domain decomposition allows for proper communication between control points, even if the physical regions of the cores are staggered or non-adjacent, and even if the area of influence of B-spline basis functions spans multiple subdomains at this time. In numerical examples, the quasi-3D benchmark solid column collapse problem is computed for multiple core configurations to verify the effectiveness of the DLB method in terms of scalability and parallelization efficiency. The simulation of the full 3D column collapse problem also illustrates the applicability of the proposed DLB method to large-scale disaster simulations. Finally, to demonstrate the promise and capability of the DLB technique in the MPM algorithm, a full-scale size landslide disaster simulation is carried out to illustrate that it can withstand some practical size calculations.
Hinweise
Soma Hidano and Shaoyuan Pan contributed equally to this work.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
Large landslides and debris flows with significant deformation, movement, and destruction of geostructures occur regularly around the world [1‐5], posing a major threat to local infrastructure and even leading to severe casualties among the local population. However, the process of such an event is not well understood due to the complicated behavior of geomaterials. Although some model experiments have been reported in the past, it is practically impossible to reproduce actual phenomenon on an experimental scale, and the mechanism by which experiments and actual phenomena occur are not always the same. In recent years, however, improved computer performance and the development of numerical analysis tools have gradually made large-scale disaster simulation possible. Therefore, there is now a need to develop effective and reliable numerical analysis tools in order to elucidate the mechanisms of phenomena occurring in the field and to make preliminary assessments and predictions of similar events that may occur in the future.
Mesh-based schemes, such as the finite element method (FEM) are the most commonly used numerical method in the civil engineering industry, and their capabilities have been confirmed in solving a variety of geotechnical engineering problems. However, these schemes necessarily require complex remeshing algorithms and are not suitable for solving complex large deformation problems. On the other hand, meshless or particle-based methods, such as the smoothed particle hydrodynamics (SPH) method [6‐10], the particle finite element method (PFEM) [11‐14] and the material point method (MPM) [15‐21], have rapidly developed and are now widely used due to their robust handling of large deformations. Notable among these methods is the MPM, which has contributed to remarkable progress especially in the geotechnical engineering field, and has become the method of choice for many researchers.
Anzeige
MPM is a hybrid Lagrangian–Eulerian method originally proposed by Sulsky et al. [22] to solve continuum problems based on solid mechanics, while it stemmed from particle-in-cell (PIC) [23, 24] and fluid-implicit-particle (FLIP) [25] methods. Lagrangian material points and an Eulerian background grid are introduced for the representation of a continuum body to make use of the merits of both descriptions. The continuum body is set to be composed of a finite number of material points, which are referred to as “particles” and tracked throughout the whole deformation process. Each particle is given material information such as mass, volume, position, velocity, stress, deformation gradient, and history-dependent variables. On the other hand, the background grid, which can be called “mesh” in analogy with a FE mesh, covers a computational domain and is spatially fixed through the entire calculation time. In essence, MPM can be regarded as a variant of FEM, except that numerical integration is performed at material points that move over time, in contrast to fixed quadrature points in the FEM. Therefore, the advantages of FEM will be retained. Also, unlike PFEM, MPM does not require remeshing to avoid severe mesh distortion thanks to the combined use of the Lagrangian description and Eulerian grids. Moreover, unlike SPH, there is no need to search for neighbor particles for MPM, reducing computational costs.
Since MPM uses a standard cubic mesh, which facilitates parallelization through domain decomposition and improves the calculation efficiency. In conventional parallel computing, it is common practice to maintain a consistent domain decomposition from the initial state, which is suitable for the Eulerian FEM. In the Eulerian FEM [20, 26, 27], where the mesh remains fixed and only node calculations are considered, preserving the initial partition state has minimal impact on the calculation efficiency. However, MPM introduces a unique challenge. That is, unlike the Eulerian FEM, MPM needs to take into account not only node calculations but also the movement of material points between divided or partitioned domains. As the calculation progresses, material points move, and a situation can arise where material points are dense in some areas and almost nonexistent in others, leading to large disparities in computing efficiency between the divided domains. This imbalance prevents the benefits of parallel computing from being fully exploited.
To address this problem and mitigate computational imbalances as much as possible, especially in scenarios involving large-scale, long-duration phenomena, several high-performance computing (HPC) techniques for particle methods [28, 29] have been developed in recent years. In this context, we should turn our attention to large-scale simulations carried out in other research fields. For example, the N-body simulations in astrophysics require a huge number of particles to represent various systems, such as globular clusters, galaxies, clusters of galaxies, and the large-scale structure of the Universe. Advanced domain decomposition with dynamic load balancing is essential for achieving high performance in supercomputers, as self-gravity drives the anisotropic evolution of structures. Various types of domain decomposition (orthogonal recursive bisection [30], multi-section [31], and space-filling curve (SFC) based decomposition [32]) and dynamical load-balancing (DLB) (load-balancing based on work estimation [33] and sampling method [34]) have been proposed in previous studies. It is a known fact that such techniques have contributed to the enormous performance of the world’s fastest supercomputers at the times [35‐37]. MPM are therefore also expected to benefit from the continued and long-term developments in astrophysics.
In this context, several studies have recently applied dynamic load balancing to MPMs; see, for example, Kumar et al. [38] using CPUs and Dong and Grabe [39], Zhang et al. [17], Hu et al. [40] using GPUs. In particular, the previous example by Zhang et al. [17] applies this method to a real disaster to achieve a large-scale simulation. Their study features the use of GIMP for basis functions in MPMs to reduce “cell-crossing errors” associated with the accuracy of spatial integration. Here, "the cell-crossing error" refers to the numerical error that occurs when a material point crosses a background cell due to the continuous nature of \(C^0\) in the linear basis function employed in standard MPMs, while GIMP achieves \(C^1\) near the cell boundary. Another strategy to address this is to use more versatile B-spline basis functions, which can be extended to the extended B-spline basis function [41‐43]. These bases are effective in preventing a loss of integration accuracy in regions with a small number of material points. To our knowledge, this is the first time that the dynamic load balancing technique has been incorporated into the MPM enhanced by B-spline basis functions to achieve its higher computational efficiency for large-scale problems.
Anzeige
Moreover, it is important to highlight that although many MPM algorithms use GIMP as a basis function to overcome the“cell-crossing error" between the background cells, the accuracy of results obtained using GIMP is not as good as that of B-splines. Since GIMP basis functions only achieve \(C^1\) continuity at the cell edges and still use a linear basis function inside the cell, their gradient is constant within the mesh, which significantly affects numerical accuracy. On the other hand, higher-order B-spline basis functions achieve \(C^1\) continuity both at the mesh edges and within the mesh, ensuring at least \(C^0\) continuity of the gradients, thereby greatly improving numerical accuracy. For more detailed information, see the paper by Zhao et al. [44], which compares the results obtained by GIMP and B-spline basis functions through several numerical examples.
Against the above background, in this study, the DLB technique in conjunction with the sampling method is introduced into the MPM using higher-order B-spline basis functions to reduce the load bias between processes and make the algorithm suitable for large-scale simulation. The DLB technique would adaptively adjust the size of the decomposed subdomains according to the number of material points to ensure that the number of material points is as equal as possible in multiple cores, therefore reducing the imbalanced situation for the material points between cores. Nevertheless, when the sampling method is applied to divide the domain, staggered subdomains are inevitably generated. In spite of the fact that such an unfavorable subdomain arrangement can occur from the point of view of MPI communication, it is devised so that appropriate communication can take place between the control points of the B-spline basis functions allocated to the cores corresponding to the divided subdomains. In addition, it should be noted that the mapping of physical quantities at material points between such subdomains is also carried out.
The paper is structured as follows: In Sect. 2, the basics of MPM is presented. In Sect. 3, we provided a detailed description of the DLB technique and how to incorporate it with the MPM, specifically when using the quadratic B-Spline basis functions. In Sect. 4, several numerical examples are presented to investigate the effectiveness of the DLB technique in improving the computational efficiency of MPM and demonstrate its applicability in a full-scale disaster simulation. In Sect. 5, conclusions and future research directions are outlined.
2 Material point method
2.1 Governing equations
We consider a continuum body that occupies domain \(\Omega\). The strong form of the momentum balance equation for the Lagrangian description is written as
where \(\rho\) is the mass density, \(a_i\) is the acceleration, \(\sigma _{ij}\) is the Cauchy stress, and \(b_i\) is the body force; subscripts \(i, j = 1, 2, 3\) are used to denote components of vectors and tensors. The corresponding weak form of the momentum balance equation is given as
where \(\omega _i\) is a test function that satisfies the condition \(\omega _i =0\) on Dirichlet boundary \(\partial \Omega _{v}\), and \(t_i\) is the prescribed traction vector acting on Neumann boundary \(\partial \Omega _{\sigma }\).
2.2 Discretization
In the spatial discretization of the MPM, the domain integral of a physical quantity \(\Phi ({\varvec{x}})\) is approximated as follows:
where \(N^\text {mp}\) is the total number of material points and \(\Omega ^{p}\) is the volume associated with material point p that can be identified with position vector \({{\varvec{x}}}_p\). Field variables at a material point, such as mass, velocity, etc. can be approximated by interpolation using the basis functions \(N_{\alpha }({\varvec{x}})\) associated with the Eulerian grid and their values \(\Phi _{\alpha }\) on nodes (or grid points) \(\alpha\) as
where \(N^\text {n}\) is the total number of nodes. Here, the variables with superscript ‘h’ indicate their approximations in this manner.
In this study, we employ the quadratic B-spline basis functions for basis functions to suppress the cell-crossing error [45, 46] that often arises in the original MPM with linear basis functions. When using quadratic B-spline basis functions, there are no points on the cell edges to compute variables, and henceforth, instead of calling them “nodes”, we will call them “control points”, meaning that they control their function profiles. On the other hand, when linear basis functions are used for comparison in subsequent sections, we will continue to refer to “nodes” rather than “control points”. Thus, \(\Phi _{\alpha }\) represents a variable at the control point identified by the subscript \(\alpha\), and \(N_n\) denotes the number of control points.
The substituion of Eqs. (3) and (4) into Eq. (2) yields the following semi-discrete momentum balance equation:
The semi-discrete equations derived above are discretized in time, and the numerical solution is obtained at a current time. In what follows, the Lth discrete time is denoted by \(t^L\), and the time increment to the \(L+1\)th time is denoted by \(\Delta t\equiv t^{L+1}-t^{L}\).
The time-discretized form of the momentum balance equation in Eq. (5) can be written as
to solve the \({\varvec{a}}_{i}^{L}\) explicitly with \(\theta =0\). After the control point acceleration \({\varvec{a}}_{i}^{L}\) is obtained, the control point velocity vector in the \(x_i\)-direction is updated as
Prior to updating the state variables at each material point, we adopt the MUSL procedure [24] to “refine” the control point velocity vector. That is, the following mapping process is additionally performed:
The numerical algorithm for the MPM with the explicit time integration scheme is summarized as follows:
(i)
Map the information carried by material points to the background grid. The control point values of mass and velocity, \(M_{\alpha }^{L}\) and \({\varvec{v}}_{i}^{L}\), are calculated using Eqs. (6) and (13), respectively;
(ii)
Solve Eq. (9) for the control point acceleration vector \({\varvec{a}}_i^{L}\);
(iii)
Update the velocity and position vectors at material points using Eqs. (11) and (12), respectively;
(iv)
Following the procedure provided in Appendix 1, update the Cauchy stresses \(\sigma _{ij}^{L+1}\) by applying the return mapping algorithm, and then compute the logarithmic elastic strain \(\varepsilon _{ij}^{\textrm{e}, L+1}\) at the material points as
where G and K are the shear and the bulk modulus of elasticity, respectively.
(v)
Initialize the computational grid for the next time step.
Fig. 1
Illustration of conventional domain decomposition for MPM
×
3 HPC technique for MPM
A new dynamic load balancing (DLB) method has been developed to realize MPI parallel computing for the numerical analysis of MPMs using B-spline basis functions. This adaptively changes the domain decomposition pattern so that the CPU load is equalized as material points are moved. Indeed, DLB is straightforward and has some previous studies for the MPM employing GIMP, which is a linear interpolation and has only \(C^1\) continuity on cell boundary [38‐40]. It should be noted that this study primarily focuses on the DLB method incorporated into the MPM using quadratic B-spline basis functions, and we have not compared it with the MPM with GIMP basis functions since the GIMP scheme is not implemented into our program code. Therefore, this must be open to further discussion. Nevertheless, this is the first study to ensure proper communication during parallelization when using B-spline basis functions, which has various advantages over GIMP.
3.1 Dynamic load balancing (DLB) technique
In conventional parallel computing, it is common practice to retain the initial domain decomposition, which is suitable for Eulerian mesh-based methods. However, MPM necessitates consideration of not only control point calculations but also the movement of material points between partitioned domains. As illustrated in Fig. 1, the initial mesh partitioning appears to be balanced, but can become unbalanced if material points move significantly over time. In the example in this figure, as the calculation progresses, the material points move and become concentrated in the domains assigned to cores 1 and 2, leaving the minimal number of material points required for the calculation in the domains assigned to cores 3 and 4. Consequently, the calculation time for cores 1 and 2 significantly exceeds that of cores 3 and 4, resulting in a substantial disparity in computing efficiency. This imbalance prevents the full benefits of parallel computing from being exploited. To address this issue and mitigate the imbalance in computation and communication as much as possible, especially in scenarios involving large-scale and long-duration phenomena, we adopted the dynamic load balancing (DLB) technique proposed by [47], which dynamically changes the domain decomposition as needed to ensure that particles are equally distributed to all subdomains. The implementation was initially developed for the gravitational octree code GOTHIC [48, 49] and adjusted for the MPM in this study.
In domain decomposition in particle methods such as SPH, there are generally no constraints on the shape of each subdomain. However, in space-filling curve (SFC)-based decomposition, bumpy surfaces between the decomposed subdomains are a natural output and are unsuitable for grid-based calculations in MPM. Indeed, since equation solving and variable communication in MPM take place at control points (or nodes) that make up the grid, it is preferable for each subdomain to be a rectangular in shape.
The overall algorithm for dynamically dividing the entire domain into rectangular subdomains and then performing parallel computations is outlined in Algorithm 1 in the form of pseudo code, and the overall flowchart is provided in Appendix 1. Here, the DLB explained corresponds to Lines A2-1 and A2-2, while the MPM explained in Sect. 2 corresponds to Lines A0, A1 and A3. In Line A4, particle sorting is performed according to the Morton order to increase the cache-hit rate [48]. Among these operations, Line A2-1 is the core operation of DLB and is executed to determine subdomains based on the sampling method. Algorithm 2 describes its detailed procedures and will be explained in the following. The assumption is that the whole domain is initially divided into \(N_x^\text {sub}\), \(N_y^\text {sub}\) and \(N_z^\text {sub}\) subdomains in the x, y and z directions respectively, and that the number of these divisions is fixed throughout the simulation.
Algorithm 1
Pseudo code of MPM with DLB using quadratic B-spline basis functions.
Algorithm 2
Pseudo code for determining subdomains based on sampling method (Details of A2-1 in Algorithm 1).
Since the movement of particles in a single time step is not supposed to be very large, we only need to apply the operations for DLB once for multiple time steps (corresponding to a fixed time interval). That is, DLB is performed only when mod(\(t^L\), \(t^\textrm{int}_\textrm{DLB}\)) = 0 where \(t^L\) is the current time and \(t^\textrm{int}_\textrm{DLB}\) is the time interval. In other time steps, we only communicate information of particles, which cross boundaries of subdomains. In DLB time step, the pattern of domain decomposition is determined by the sampling method described in Algorithm 2.
×
×
A specific task in Algorithm 2 is to decide where the boundaries of the subdomains should be defined according to the computational load of each MPI process. To do this, the number of sampled particles in the ith MPI process is first calculated by the following formula:
$$\begin{aligned} {n_{i}^\textrm{samp}} = N_\textrm{tot} R f_i, \end{aligned}$$
(17)
where \(N_\textrm{tot}\) is the total number of particles in the entire domain and R is the global sampling rate. Also, \(f_i\) is the correction factor for ith MPI process and is can be calculated as,
which is equivalent to \(N_i/N_\textrm{tot}\) with \(N_i\) being the number of particles in the ith subdomain. where \(t_i\) is the CPU time measured on the ith MPI process.
First, for each subdomain, particles are sampled every \(N_i / n_{i}^\textrm{samp}\) in local ID order, and once sampling has been completed for all subdomains, the sampled particles are sorted in each process (subdomain) in order of decreasing x-coordinate. Second, the root process, which is assigned to a designated subdomain, gathers the number and the x-coordinates of sampled particles from all other processes. Third, the total number of sampled particles, \(n^\text {samp}\), is calculated. Fourth, the values calculated from \(n_x^\text {new}=n^\text {samp}/N_x^\text {sub}\) determine the approximate number of sampled particles in new subdomains to be generated in the x-direction only. Fifth, in the root process, the minimum (left-hand boundary position) and maximum (right-hand boundary position) of the x-coordinates of each new subdomain are determined according to this number \(n_x^\text {new}\). This means that the whole domain is divided into \(N_x^\text {sub}\) subdomains in the x-direction only, and the new subdomain should have approximately the same number of sampled particles with respect to the x-axis. Here, the last particle ID of the \(i_x\)th subdomain is determined from \(n^\textrm{samp}\times ((i_{x}+1)/n_x^\textrm{sub})\), and the boundary between the \(i_x\)th and \(i_{x+1}\)th subdomains is identified as being between the “last particle ID” and the “last particle ID”+1. The same operation is done for the y- and z-directions separately and determine the boundaries of subdomains in each direction. It should be noted that when dividing in the y-direction, the sampled particles in each subdomain generated in the x-direction only are sorted in order of decreasing y-coordinate. Note that each subdomain boundary is rounded to the nearest control point to facilitate linkage with the grid-based calculation of the MPM.
After determining subdomains in all directions, we reassign the information of cells, control points and particles belonging to each subdomain to an associated process. In this way, the computational load can be shared almost equally among all processes. To further aid in understanding Algorithm 2, the procedure is illustrated in Fig. 2 for the case where the whole domain is divided into 4, 3, and 5 subdomains in the x, y, and z directions, respectively. Thus, the boundaries of adjacent rectangular-shaped subdomains may consequently diverge from each other, which complicates communication in the B-spline based MPM as discussed below, and is highlighted as a novelty of this study.
Fig. 2
Process of domain decomposition using DLB technique based on the sampling method
×
3.2 Communication between cores using B-spline basis functions
As mentioned in Sect. 2.2, in this study, quadratic B-spline basis functions are used for the spatial discretization of the MPM. As illustrated in Fig. 3, when using B-spline basis functions, the domain decomposition in units of the cells that make up the mesh and the location of the control points associated with the cells do not necessarily coincide. Each variable is assigned to a control point and is used for interpolating approximations within the cells of that variable. In particular, the primal variable in the governing equation, velocity, is updated at the control points using Eqs. (9) and (10). Therefore, care must be taken when mapping to control points at the boundaries of the divided regions. Specifically, in order to compute the variables at control points associated with the boundaries of the domain assigned to a core, the variables of material points belonging to other cores adjacent to that core are also needed. Therefore, it is necessary to communicate with material points that do not belong to a domain in order to pass values to control points on the boundaries between corresponding cores.
As an example, a two-dimensional physical mesh divided into four cores is shown in Fig. 4a. The black dots in the figure are control points used to interpolate and approximate variables in each subdomain, and the points further circled (double-circled-looking) in this figure are the communication target points. The B-spline basis functions associated with these control points require the mapping of physical quantities from material points belonging to several other subdomains as illustrated in Fig. 4b, since their domains of influence span different subdomains. In addition, when the DLB described earlier is applied, since the domain decomposition is done so that almost the same number of material points belong to each core, it is possible that the rectangular subdomains are arranged in alternating directions, as shown in Fig. 5a. In such cases, communication at the control points between cores assigned to different subdomains becomes more complicated; see Fig. 5b. For example, as shown in Fig. 5c, communication at the control point also requires the subregion corresponding to Core 1 to communicate with Cores 2 and 3 on the same boundary edge. In addition, due to the conditions specific to communication at the control points of the B-spline basis functions described above, even though the subdomains corresponding to Cores 1 and 4 are not adjacent, communication at the control points is also required between these cores.
Fig. 3
Schematic diagram of the physical mesh and control points associated with quadratic B-spline basis functions
Fig. 4
Details of domain decomposition using quadratic B-spline basis functions: a subdomain and associated control points for each core; b mapping information of a material point located at different subdomains to overlapped control points
Fig. 5
Schematic diagrams of the staggered physical subdomains between cores with quadratic B-spline basis functions: a distribution of material points; b physical subdomain and corresponding control point domain for each core; c an example where communication between control points is also required between Core 1 and Core 4, which are assigned to non-adjacent subdomains, in an arrangement where the physical subdomains corresponding to Core 2 and Core 1 are misaligned
×
×
×
4 Numerical examples
Several numerical examples are presented to verify the effectiveness of the proposed DLB technique for the MPM with B-spline basis functions and to demonstrate its performance with a view to application to a real-world disaster simulation. Although our method is material-independent and can be adapted to various constitutive laws, we exclusively assume soil as the material here and adopt an elastoplastic constitutive law based on the Drucker–Prager yielding criterion; see Appendix 1. In this study, all numerical examples are simulated on Wisteria/BDEC-01 Odyssey [50], a supercomputer system featuring compute nodes with 48-core Fujitsu A64FX processors (@2.2 GHz), and 32 GiB of memory per node. Note that in all the following calculations, \(R = 10^{-4}\) is used for the sampling rate described in Sect. 3.1.
4.1 Solid column collapse problem
We first verify the effectiveness of the DLB technique by simulating a benchmark problem, in which a quasi-3D solid column packed inside a rectangular container collapses as the constraints on the sides of the container are instantly released. Figure 6 shows the schematic diagram of the analysis model, in which the container is 6 m in length and 3 m in height. The solid column is covered by a background grid consisting of \(45,000 \, (=300\times 150\times 1)\) cubic cells, with 80,000 solid material points evenly spaced. Each cell has a side length of 0.02 m and contains \(16 \, (=4\times 4\times 1)\) solid material points. The material parameters used in this simulation are provided in Table 1. The non-slip condition is considered on the bottom surface, and the two sides and top surfaces are stress-free. Also, conditions apply to both sides of the xy-plane. The initial stresses in the soil are calculated by linearly increasing the gravity acceleration for 1 s. In this process, the soil is assumed to be elastic by giving sufficiently large cohesion. After that, setting the initial time at \(t=0.0\) s, a cohesion of 0 kPa is instantaneously given to all the material points. The real-time of the phenomenon covered by this simulation is 1 s, and the time increment is set at \(1.0\times 10^{-4}\) s throughout the calculation. In addition, the interval step for DLB \(t_\textrm{DLB}^\textrm{int}\) is set at 100 steps.
Fig. 6
Schematic diagram of a quasi-3D solid column
Table 1
Simulation parameters for quasi-3D solid column collapse problem
Parameter
Value
Solid density (\(\mathrm {kg/m^3}\))
\(\rho\)
1500
Young’s modulus (MPa)
E
10
Poisson’s ratio (–)
\(\nu\)
0.3
Internal friction angle (\(^\circ\))
\(\phi\)
30
Dilatancy angle (\(^\circ\))
\(\psi\)
0
Cohesion (kPa)
c
0
Gravity acceleration (\(\mathrm {m/s^2}\))
g
\(-9.81\)
×
Fig. 7
Snapshot of collapsing process of the quasi-3D solid column at representative times (\(t =[0.0, 0.2, 0.4, 0.6, 0.8, 1.0]\) s): a without DLB; b with DLB
×
Figure 7 shows a snapshot of the process of collapse of the quasi-3D solid column at representative times (\(t =[0.0, 0.2, 0.4, 0.6, 0.8, 1.0]\) s). Here, the subdomains allocated to the 48 cores are shown color-coded. The upper side shows the simulation results without DLB (non-DLB case), and the lower side displays the results with the proposed DLB (DLB case). In the non-DLB case, the spatially fixed mesh is decomposed into \(8 \times 6\) subdomains, and the material points were distributed only in 8 cores at the initial stage (\(t=0.0\) s). This decomposition pattern remains the same throughout the simulation, causing severe load imbalances. That is, under the action of gravity acceleration, the soil column began to move downward and spread to both sides. Since the core distributions are fixed in the non-DLB case, the material point movement only occurs in a few cores as shown in Fig. 7a. Here, only a few colors appear, corresponding to the IDs of the cores. This is because material points are present only in the colored cores and not in the cores where no material points are present. Therefore, there is an imbalance throughout the calculation. On the other hand, in the case of DLB, only the cells where material points are present are subject to domain decomposition, so they are initially almost evenly distributed over the 48 cores. During the collapse process, the core distribution changes according to the material point distributions, so that the material point distributions remain approximately balanced among all cores until the end. This results in higher computational efficiency compared to the non-DLB case. Indeed, in this example (with 48 cores), the non-DLB analysis requires approximately 3529 s to complete the 1 s simulation, while the DLB analysis only took about 529 s. The calculation time of the former is approximately 6.67 times that of the latter.
4.2 Scalability and parallel efficiency for quasi-3D problem
To further explore the strong scaling and parallel efficiency of the proposed DLB for the MPM with B-spline basis functions, we conducted additional numerical tests utilizing 13 different core configurations for the same problem size. Up to calculations using 1, 6, 12, 24, and 36 cores, a single node is used; further calculations using 60, 72, 84, 96, 108, 120, 132, and 144 cores use multiple nodes. Note that node of the Wisteria/BDEC-01 has 48 cores.
The elapsed or, equivalently, wall-clock times for both non-DLB and DLB analyses are summarized in Fig. 8. Here, the parallel efficiency is detailed by dividing the elapsed times into the time spent on MPI communication and the time spent on processing other than MPI communication. Since there have been variations in the elapsed times of the core, only the minimum, maximum and average elapsed 2 times among all MPI processes are plotted in these figures. As can be seen from this figure, the average elapsed time for both analyses decreases as the number of cores increases, and the DLB analysis shows significantly higher performance compared to the non-DLB analysis. Here, the maximum “average elapsed time” ratio of the non-DLB case to that of the DLB case is approximately 7.15 with 36 cores, the minimum value with 12 cores is 4.18, and the average value excluding the single core cases is 5.95. As can be seen from this figure, the gap between the minimum and maximum elapsed times of the DLB case are much smaller than that of the non-DLB case. In other words, employing DLB significantly reduces the maximum elapsed time of communication and suppresses load bias among cores. Thus, it is clear that the proposed DLB technique offers significant advantages in improving computational efficiency.
Tables 2 and 3 compare the parallel efficiency between the cases with and without DLB on a single node and multiple nodes, respectively. As the number of cores increases, the parallel efficiency for the non-DLB analyses decreases significantly, maintaining a low efficiency between 10 and 20%. These results indicate poor scaling behavior when the computation is distributed across more cores without DLB. On the other hand, the implementation of the proposed DLB shows a substantial enhancement in efficiency. Even with the number of cores scaling up to 144, the parallel efficiency surpasses 55%. These results demonstrate the effectiveness of the proposed DLB in preserving high parallel efficiency even as the number of cores increases, thereby enhancing resource utilization.
As mentioned before, to reduce the cell-crossing error, the proposed method employs the quadratic B-spline basis functions instead of linear basis functions. This is directly related to the novelty of this study. Specifically, the control points associated with the quadratic B-spline basis functions belonging to one subdomain are not necessarily located on its boundary or on the cell boundaries belonging to it, but also outside of it. And the control points belonging to another adjacent subdomain that need to communicate with these control points are also located inside as well as on its boundary. This introduces complexity in parallel computation algorithms and dynamic load balancing implementations and was schematized in Figs. 4 and 5. Therefore, the proposed parallel scheme itself has no advantage over existing ones in terms of improved computational efficiency but has some advantage in terms of reducing cell-crossing error inherent in MPMs, which has been discussed in many studies; see, for example, Reference [44]. In other words, the advantage is not directly related to the new parallel scheme proposed in this study, but to the interpolation method.
In this context, it is worthwhile to compare the computational costs of using quadratic B-spline basis functions and linear basis functions for DLB-accelerated MPMs (hereafter referred to as quadratic and linear MPMs). We have tested simulating the same quasi-3D solid column collapse problem with the linear MPM and compared the elapsed time with that of the quadratic MPM simulations. Figure 9 shows the changes in the minimum, maximum and average elapsed times for the linear and quadratic MPMs when changing the numbers of cores. As before, the elapsed times were divided into two. From this figure, the elapsed times of the quadratic MPM are slightly longer than the linear MPM. Unlike the linear basis function, the quadratic B-spline basis function achieves \(C^1\) continuity in a cell, but at the cost of a larger number of active control points. This also results in longer elapsed times for communications and other operations. However, the maximum “average elapsed time” ratio of the quadratic case to that of the linear case is approximately 1.69, the average value excluding the single core cases is 1.57. Since the quadratic B-spline basis functions have a larger order and more degrees of freedom, the computation time with the quadratic MPM is naturally longer, but the order of time is comparable to that of the linear MPM. Judging from the above, including the superiority of the quadratic B-spline basis functions in terms of computational stability, the advantages of the proposed DLB-accelerated quadratic MPM in this study are obvious.
Fig. 8
Minimum, maximum and average elapsed times among all MPI processes versus number of cores when analyzed with DLB-accelerated MPM and standard MPM using quadratic B-spline basis functions
Table 2
Parallel efficiency on a single node for quasi-3D solid column collapse simulation
Number of cores
1
6
12
24
36
48
Non-DLB (%)
100
20.6
21.1
13.9
11.2
11.4
DLB (%)
100
94.5
91.2
87.3
82.8
78.6
Table 3
Parallel efficiency on multiple nodes for quasi-3D solid column collapse simulation
Number of cores
60
72
84
96
108
120
132
144
Non-DLB (%)
11.2
10.9
10.8
10.7
10.6
10.9
10.6
10.4
DLB (%)
76.8
73.2
70.8
67.9
66.8
62.2
61.2
58.2
×
Fig. 9
Minimum, maximum and average elapsed times among all MPI processes versus number of cores when analyzed with DLB-accelerated MPM using linear and quadratic B-spline basis functions
Fig. 10
Numerical model for a full 3D solid cylinder collapse problem
×
×
4.3 Scalability and parallel efficiency for full-3D problem
To further confirm the performance of the proposed method, a full 3D simulation of a cylinder collapsing in a rectangular analysis domain was performed. Figure 10 a and b show the top and side views of the numerical model. The domain size is \(7.0\,\textrm{m}\,\times \,7.0\,\textrm{m}\,\times \,3.0\,\textrm{m}\), corresponding to the length, width, and height, respectively. The results of the non-DLB analysis are used as the reference solution for the DLB analysis. In both analyses, 288 cores are used in multiple nodes. The information about the prepared meshes and material points is summarized in Table 4. The simulation parameters are the same as in the above example. The non-slip condition is applied on the bottom surface. In this setting, material points are not supposed to reach the sidewall, so the boundary conditions there do not affect the results. The real-time of the phenomenon covered by this simulation is 1 s, and the time increment is set at \(1.0\times 10^{-4}\) s throughout the calculation. In addition, the interval step for DLB \(t_\textrm{DLB}^\textrm{int}\) is set at 1000 steps.
Table 4
Information of full 3D solid cylinder model
Number of meshes
\(9,408,000 \, (=280\times 280\times 120)\)
Size of mesh (\(\textrm{m}\))
\(0.025\times 0.025\times 0.025\)
Number of solid material points
2,712,959
Number of solid material points per cell
\(27 \, (=3\times 3\times 3)\)
Number of control points
\(9,701,928 \, (=282\times 282\times 122)\)
Fig. 11
Snapshot of collapsing process of full-3D solid cylinder collapse problem
×
Figure 11a and b show the numerical results obtained by the non-DLB and DLB analyses, respectively. Here, the color corresponds to the core ID assigned to the subdomain As can be seen from these figures, both simulations show the same behavior. In the non-DLB case, the material points are only distributed in some cores during the calculation, while those in the DLB case are almost evenly distributed among all cores. The computational time for the non-DLB analysis requires about \({138,037}~\textrm{s} \approx ({38.3}~\textrm{h})\), whereas the DLB analysis only took about \({6,886}~\textrm{s} \approx ({1.91}~\textrm{h})\). The computation time for a DLB analysis is about 1/20 of that for a non-DLB analysis, indicating that the proposed DLB is highly computationally efficient. This highlights the potential applicability offered by the proposed DLB technique for large-scale simulations.
Considering the spatial scale of actual natural disasters, which can be hundreds or even thousands times larger than the experimental scale, the introduction of the proposed DLB technique for the B-spline based MPM is highly significant and expected to effectively handle large-scale landslides.
4.4 Full-scale landslide simulation
Using the performance of the proposed DLB technology, a landslide simulation are carried out on a full-scale virtual slope shown in Fig. 12. Unconsolidated soil with a high air content (the porosity is about 0.756) is assumed as the soil material comprising the slope.
To avoid unnecessarily large degrees of freedom, the number of material points is reduced by reference to the decay depth, as described below. First, based on the elevation, \(2 \times 2\) particles are placed in a 1 m cubic grid. Next, to reduce the number of material points, the top surface is divided into \(100\,\times \,100\) sub-regions (clustering the DEM data) in the x and z directions, and material points are placed from the cell with the smallest elevation point in each sub-region to the cell approximately 11 m below. Then, the top 8 m is the surface layer and below that is the base surface.
Fig. 12
Simulation model of a full-scale disaster simulation
×
Finally, the boundary conditions are set as follows: in the vertical direction, the velocity of control points at the bottom is set to 0 m/s; in other directions, the velocity of control points on the base layer is set to 0 m/s. In addition, non-slip condition is applied on side edge of the analysis domain.
The material parameters are shown in Table 5, and the gravity acceleration g is \(-9.81\,\mathrm {m/s^2}\). \(2\times 2\times 2\) material points are placed in each cell with 1 m sides, and the simulation area is 450 m \(\times\) 125 m \(\times\) 320 m. Thus, the number of control points and particles were 18,484,008 and 19,150,323. In addition, the quadratic B-spline function is used as the basis function, and the time increment is set to \(\Delta t=1.0 \times 10^{-3}\) s. The interval step for DLB \(t_\textrm{DLB}^\textrm{int}\) is set at 5000 steps. To obtain the initial state of slope, a quasi-static analysis is conducted for 10,000 steps by increasing gravity from 0 to 9.8 m/s\(^2\). Moreover, as a hypothetical earthquake, the body forces in each direction are input as the acceleration of the sin curve of north–south, east–west and up-down components; amplitude is 600 gal and period is 1 Hz from 10 to 90 s. The simulation uses a Wisteria/BDEC-01 with \(48\,\times \,4\,\times \,30 = 5760\) CPU cores. The DLB is performed made every 5000 steps.
Figure 13 is a snapshot of the change in elevation distribution, where the basal layer, shown in brown, is in an immobile setting. This figure shows that the surface layer collapsed and flowed to the bottom due to the earthquake. This figure shows that the virtual earthquake shaking caused the surface layer to collapse and flow to the bottom. The time taken for the calculation was about 6.3 h (22,745 s) for the 100 s phenomenon. Also, the computational computation time required to simulate the first 10 s phenomenon with and without DLB was 2440 and 8924 s, respectively. Thus, the proposed MPM analysis with DLB is about 3.66 times faster than the analysis without DLB. The reason why not much significant computational speedup has been obtained is that there are fewer particles in the thickness direction than in the in-plane direction, and particle motion is relatively small. If the particle motion were larger in all directions, the proposed method would be more efficient.
Table 5
Simulation parameter for full-scale disaster simulation
Parameters
Surface layer
Base layer
Soil density (kg/m\(^3\))
\(\rho\)
617
1500
Young’s modulus (MPa)
E
4.0
\(1.0 \times 10^3\)
Poisson’s ratio (–)
\(\nu\)
0.4
0.3
Internal friction angle (\(^\circ\))
\(\phi\)
30.0
55.0
Dilatancy angle (\(^\circ\))
\(\psi\)
0.0
55.0
Cohesion (kPa)
c
5.0
300.0
Fig. 13
snapshot of the change in elevation distribution in a full-scale disaster simulation
×
5 Conclusions
In this study, a new DLB technique is developed for parallel MPI calculations based on domain decomposition in order to perform large landslide simulations in realistic computation time using MPM with B-spline basis functions. Despite using higher-order B-spline basis functions with a range of influence across cells compared to common basis functions, the proposed DLB technique dynamically adjusts the size of the computational domain according to the material point distribution, so that material points are almost equally distributed across all cores. This allowed the load bias between cores to be mitigated and the advantages of parallel computation to be fully exploited. Specifically, a novel contribution of this study is the measure to ensure proper communication between the control points, especially when the area of influence of the B-spline basis function spans multiple subdomains, although the physical domain to which the cores are allocated are staggered or non-adjacent due to the domain decomposition.
In a numerical example, the quasi-3D benchmark solid column collapse problem was tested on 14 different core configurations to verify the effectiveness of the proposed DLB method in terms of scalability and parallelization efficiency. Simulations of the full 3D column collapse problem also demonstrate the applicability of the developed DLB method to large-scale disaster simulation. Finally, to demonstrate the promise and capability of the DLB technique in the MPM algorithm, a simulation of a full-scale size landslide disaster was carried out, showing that the technique can withstand practical calculations.
In the future, we will work on the development of new parallel calculation methods, such as the introduction of the DLB method into the MPM-FEM hybrid method for landslide-triggered tsunami problems, the development of parallel calculation methods using different core distributions for MPM and FEM, and the realization of large-scale fluid calculations by introducing high-performance solvers into FEM.
Acknowledgements
This work was supported by JSPS KAKENHI (Grant Numbers: 23KJ0124, 22H00507 and 19H01094) and JST SPRING (Grant Number: JPMJSP2114). Also, the authors would like to thank the JHPCN project (Project IDs: jh220019 in 2022, jh230070 in 2023, and jh240040 in 2024) for allowing us to use the Wisteria/BDEC-01 Supercomputer System at the Information Technology Center, The University of Tokyo free of charge. In addition, the authors would like to thank Ryoichi Kimura of Tohoku University for his technical assistance with geometry modeling.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
where \(F_{ik}^\textrm{e}\) and \(F_{kj}^\textrm{p}\) are the elastic and plastic components, respectively. Using the deformation gradient, the logarithmic elastic strain in the space description \(\varepsilon _{ij}^\textrm{e}\) is represented as:
where \(b_{ij}^\textrm{e}\) is the left elastic Cauchy–Green deformation tensor defined as \(b_{ij}^\textrm{e}=F_{ik}^\textrm{e}F_{jk}^{\textrm{e}~T}\). The elastic response is represented by the Hencky hyperelastic model, in which the energy function is defined as:
where \(\lambda _{\alpha }^\textrm{e} \, (\alpha =1,2,3)\) are the three principal stretches of the elastic component of the left-Cauchy–Green deformation tensor, \(\lambda , \, \mu\) are Lamé constant, and \(J^\textrm{e}\) is the determinant of \(F_{ik}^\textrm{e}\). Also, we use the Drucker–Prager yield criterion to represent the plastic deformation, which its plastic potential function is given as
where \(s_{ij}\) is the deviatoric stress defined as \(s_{ij}=\sigma _{ij}-p\delta _{ij}\), p is the hydrostatic pressure component and \(J_2\) is the second invariant of the deviatoric stress tensor. Also, c is the cohesion, and \(\eta , \, \xi\) are material constants defined as
where \(\phi\) is the internal friction angle. In order to suppress excessive dilatancy, the non-associated flow rule is employed and the dilatancy angle \(\psi\) is set to zero in this study. The corresponding plastic potential to determine the specific flow rule is formed by setting the second and third terms on the right-hand side in Eq. (A4) to zeros.
In addition, when the tensile stress acts on the soil, the soil particles lose contact with each other and the soil lose the effective stress. To simulate this phenomenon, we adopt the following model [51]:
$$\begin{aligned} K = \left\{ \begin{array}{ll} K & \text{ if }~~~\varepsilon ^\textrm{e}_\textrm{v} < 0,\\ 0 & \text{ otherwise }, \end{array} \right. \end{aligned}$$
(25)
where \(\varepsilon ^\textrm{e}_\textrm{v}\) is the volumetric part of logarithmic elastic strain.
Appendix 2: Flow chart
This section presents a flowchart of the computation code for MPM incorporating DLB using quadratic B-spline basis functions (Fig. 14).
Fig. 14
Flow chart of MPM with DLB using the quadratic B-spline