Skip to main content

Open Access 07.01.2025 | Original Article

B-spline-based material point method with dynamic load balancing technique for large-scale simulation

verfasst von: Soma Hidano, Shaoyuan Pan, Keina Yoshida, Reika Nomura, Yohei Miki, Masatoshi Kawai, Shuji Moriguchi, Kengo Nakajima, Kenjiro Terada

Erschienen in: Engineering with Computers

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this study, a dynamic load-balancing (DLB) technique based on the sampling method is developed for MPMs using higher-order B-spline basis functions for parallel MPI calculations based on domain decomposition, in order to perform large-scale, long-duration landslide simulations in realistic computation time. Higher-order B-spline basis functions use a range of influence across cells compared to general basis functions, but this DLB technique dynamically adjusts the size of the computational domain according to the material point distribution, so that the material points are almost equally distributed across all cores. This allows the load bias between cores to be mitigated and the advantages of parallel computation to be fully exploited. Specifically, the novel contribution of this study is that the domain decomposition allows for proper communication between control points, even if the physical regions of the cores are staggered or non-adjacent, and even if the area of influence of B-spline basis functions spans multiple subdomains at this time. In numerical examples, the quasi-3D benchmark solid column collapse problem is computed for multiple core configurations to verify the effectiveness of the DLB method in terms of scalability and parallelization efficiency. The simulation of the full 3D column collapse problem also illustrates the applicability of the proposed DLB method to large-scale disaster simulations. Finally, to demonstrate the promise and capability of the DLB technique in the MPM algorithm, a full-scale size landslide disaster simulation is carried out to illustrate that it can withstand some practical size calculations.
Hinweise
Soma Hidano and Shaoyuan Pan contributed equally to this work.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Large landslides and debris flows with significant deformation, movement, and destruction of geostructures occur regularly around the world [15], posing a major threat to local infrastructure and even leading to severe casualties among the local population. However, the process of such an event is not well understood due to the complicated behavior of geomaterials. Although some model experiments have been reported in the past, it is practically impossible to reproduce actual phenomenon on an experimental scale, and the mechanism by which experiments and actual phenomena occur are not always the same. In recent years, however, improved computer performance and the development of numerical analysis tools have gradually made large-scale disaster simulation possible. Therefore, there is now a need to develop effective and reliable numerical analysis tools in order to elucidate the mechanisms of phenomena occurring in the field and to make preliminary assessments and predictions of similar events that may occur in the future.
Mesh-based schemes, such as the finite element method (FEM) are the most commonly used numerical method in the civil engineering industry, and their capabilities have been confirmed in solving a variety of geotechnical engineering problems. However, these schemes necessarily require complex remeshing algorithms and are not suitable for solving complex large deformation problems. On the other hand, meshless or particle-based methods, such as the smoothed particle hydrodynamics (SPH) method [610], the particle finite element method (PFEM) [1114] and the material point method (MPM) [1521], have rapidly developed and are now widely used due to their robust handling of large deformations. Notable among these methods is the MPM, which has contributed to remarkable progress especially in the geotechnical engineering field, and has become the method of choice for many researchers.
MPM is a hybrid Lagrangian–Eulerian method originally proposed by Sulsky et al. [22] to solve continuum problems based on solid mechanics, while it stemmed from particle-in-cell (PIC) [23, 24] and fluid-implicit-particle (FLIP) [25] methods. Lagrangian material points and an Eulerian background grid are introduced for the representation of a continuum body to make use of the merits of both descriptions. The continuum body is set to be composed of a finite number of material points, which are referred to as “particles” and tracked throughout the whole deformation process. Each particle is given material information such as mass, volume, position, velocity, stress, deformation gradient, and history-dependent variables. On the other hand, the background grid, which can be called “mesh” in analogy with a FE mesh, covers a computational domain and is spatially fixed through the entire calculation time. In essence, MPM can be regarded as a variant of FEM, except that numerical integration is performed at material points that move over time, in contrast to fixed quadrature points in the FEM. Therefore, the advantages of FEM will be retained. Also, unlike PFEM, MPM does not require remeshing to avoid severe mesh distortion thanks to the combined use of the Lagrangian description and Eulerian grids. Moreover, unlike SPH, there is no need to search for neighbor particles for MPM, reducing computational costs.
Since MPM uses a standard cubic mesh, which facilitates parallelization through domain decomposition and improves the calculation efficiency. In conventional parallel computing, it is common practice to maintain a consistent domain decomposition from the initial state, which is suitable for the Eulerian FEM. In the Eulerian FEM [20, 26, 27], where the mesh remains fixed and only node calculations are considered, preserving the initial partition state has minimal impact on the calculation efficiency. However, MPM introduces a unique challenge. That is, unlike the Eulerian FEM, MPM needs to take into account not only node calculations but also the movement of material points between divided or partitioned domains. As the calculation progresses, material points move, and a situation can arise where material points are dense in some areas and almost nonexistent in others, leading to large disparities in computing efficiency between the divided domains. This imbalance prevents the benefits of parallel computing from being fully exploited.
To address this problem and mitigate computational imbalances as much as possible, especially in scenarios involving large-scale, long-duration phenomena, several high-performance computing (HPC) techniques for particle methods [28, 29] have been developed in recent years. In this context, we should turn our attention to large-scale simulations carried out in other research fields. For example, the N-body simulations in astrophysics require a huge number of particles to represent various systems, such as globular clusters, galaxies, clusters of galaxies, and the large-scale structure of the Universe. Advanced domain decomposition with dynamic load balancing is essential for achieving high performance in supercomputers, as self-gravity drives the anisotropic evolution of structures. Various types of domain decomposition (orthogonal recursive bisection [30], multi-section [31], and space-filling curve (SFC) based decomposition [32]) and dynamical load-balancing (DLB) (load-balancing based on work estimation [33] and sampling method [34]) have been proposed in previous studies. It is a known fact that such techniques have contributed to the enormous performance of the world’s fastest supercomputers at the times [3537]. MPM are therefore also expected to benefit from the continued and long-term developments in astrophysics.
In this context, several studies have recently applied dynamic load balancing to MPMs; see, for example, Kumar et al. [38] using CPUs and Dong and Grabe [39], Zhang et al. [17], Hu et al. [40] using GPUs. In particular, the previous example by Zhang et al. [17] applies this method to a real disaster to achieve a large-scale simulation. Their study features the use of GIMP for basis functions in MPMs to reduce “cell-crossing errors” associated with the accuracy of spatial integration. Here, "the cell-crossing error" refers to the numerical error that occurs when a material point crosses a background cell due to the continuous nature of \(C^0\) in the linear basis function employed in standard MPMs, while GIMP achieves \(C^1\) near the cell boundary. Another strategy to address this is to use more versatile B-spline basis functions, which can be extended to the extended B-spline basis function [4143]. These bases are effective in preventing a loss of integration accuracy in regions with a small number of material points. To our knowledge, this is the first time that the dynamic load balancing technique has been incorporated into the MPM enhanced by B-spline basis functions to achieve its higher computational efficiency for large-scale problems.
Moreover, it is important to highlight that although many MPM algorithms use GIMP as a basis function to overcome the“cell-crossing error" between the background cells, the accuracy of results obtained using GIMP is not as good as that of B-splines. Since GIMP basis functions only achieve \(C^1\) continuity at the cell edges and still use a linear basis function inside the cell, their gradient is constant within the mesh, which significantly affects numerical accuracy. On the other hand, higher-order B-spline basis functions achieve \(C^1\) continuity both at the mesh edges and within the mesh, ensuring at least \(C^0\) continuity of the gradients, thereby greatly improving numerical accuracy. For more detailed information, see the paper by Zhao et al. [44], which compares the results obtained by GIMP and B-spline basis functions through several numerical examples.
Against the above background, in this study, the DLB technique in conjunction with the sampling method is introduced into the MPM using higher-order B-spline basis functions to reduce the load bias between processes and make the algorithm suitable for large-scale simulation. The DLB technique would adaptively adjust the size of the decomposed subdomains according to the number of material points to ensure that the number of material points is as equal as possible in multiple cores, therefore reducing the imbalanced situation for the material points between cores. Nevertheless, when the sampling method is applied to divide the domain, staggered subdomains are inevitably generated. In spite of the fact that such an unfavorable subdomain arrangement can occur from the point of view of MPI communication, it is devised so that appropriate communication can take place between the control points of the B-spline basis functions allocated to the cores corresponding to the divided subdomains. In addition, it should be noted that the mapping of physical quantities at material points between such subdomains is also carried out.
The paper is structured as follows: In Sect. 2, the basics of MPM is presented. In Sect. 3, we provided a detailed description of the DLB technique and how to incorporate it with the MPM, specifically when using the quadratic B-Spline basis functions. In Sect. 4, several numerical examples are presented to investigate the effectiveness of the DLB technique in improving the computational efficiency of MPM and demonstrate its applicability in a full-scale disaster simulation. In Sect. 5, conclusions and future research directions are outlined.

2 Material point method

2.1 Governing equations

We consider a continuum body that occupies domain \(\Omega\). The strong form of the momentum balance equation for the Lagrangian description is written as
$$\begin{aligned} \begin{aligned} \rho a_i=\frac{\partial \sigma _{ij}}{\partial x_j}+\rho b_i \quad \text{ in } \ \Omega , \end{aligned} \end{aligned}$$
(1)
where \(\rho\) is the mass density, \(a_i\) is the acceleration, \(\sigma _{ij}\) is the Cauchy stress, and \(b_i\) is the body force; subscripts \(i, j = 1, 2, 3\) are used to denote components of vectors and tensors. The corresponding weak form of the momentum balance equation is given as
$$\begin{aligned} \begin{aligned}&\int _{\Omega } \left( \omega _i \rho a_i +\frac{\partial \omega _i}{\partial x_j}\sigma _{ij} \right) d\Omega \\&\quad = \int _{\partial \Omega _{\sigma }}\omega _i t_i d\Gamma +\int _{\Omega }\omega _i \rho b_i d\Omega , \end{aligned} \end{aligned}$$
(2)
where \(\omega _i\) is a test function that satisfies the condition \(\omega _i =0\) on Dirichlet boundary \(\partial \Omega _{v}\), and \(t_i\) is the prescribed traction vector acting on Neumann boundary \(\partial \Omega _{\sigma }\).

2.2 Discretization

In the spatial discretization of the MPM, the domain integral of a physical quantity \(\Phi ({\varvec{x}})\) is approximated as follows:
$$\begin{aligned} \int _{\Omega }\Phi ({\varvec{x}})d\Omega \approx \sum _{p=1}^{N^\text {mp}}\int _{\Omega ^{p}} \Phi ({\varvec{x}})d\Omega \approx \sum _{p=1}^{N^\text {mp}}\Phi ({\varvec{x}}^{p})\Omega ^{p}, \end{aligned}$$
(3)
where \(N^\text {mp}\) is the total number of material points and \(\Omega ^{p}\) is the volume associated with material point p that can be identified with position vector \({{\varvec{x}}}_p\). Field variables at a material point, such as mass, velocity, etc. can be approximated by interpolation using the basis functions \(N_{\alpha }({\varvec{x}})\) associated with the Eulerian grid and their values \(\Phi _{\alpha }\) on nodes (or grid points) \(\alpha\) as
$$\begin{aligned} \Phi ({\varvec{x}})\approx \Phi ^h({\varvec{x}})=\sum _{\alpha =1}^{N^\text {n}}N_{\alpha }({\varvec{x}})\Phi _{\alpha }, \end{aligned}$$
(4)
where \(N^\text {n}\) is the total number of nodes. Here, the variables with superscript ‘h’ indicate their approximations in this manner.
In this study, we employ the quadratic B-spline basis functions for basis functions to suppress the cell-crossing error [45, 46] that often arises in the original MPM with linear basis functions. When using quadratic B-spline basis functions, there are no points on the cell edges to compute variables, and henceforth, instead of calling them “nodes”, we will call them “control points”, meaning that they control their function profiles. On the other hand, when linear basis functions are used for comparison in subsequent sections, we will continue to refer to “nodes” rather than “control points”. Thus, \(\Phi _{\alpha }\) represents a variable at the control point identified by the subscript \(\alpha\), and \(N_n\) denotes the number of control points.
The substituion of Eqs. (3) and (4) into Eq. (2) yields the following semi-discrete momentum balance equation:
$$\begin{aligned} {\varvec{M}}{\varvec{a}}_{i}={\varvec{F}}_{\textrm{int}, \, i}+{\varvec{F}}_{\textrm{ext}, \, i}. \end{aligned}$$
(5)
Here, the control point mass matrix is lumped as
$$\begin{aligned} \left[{{\varvec{M}}}\right]_{\alpha }=\sum _{p=1}^{N^\text {mp}}m^{p}N_{\alpha }({{\varvec{x}}}^{p}), \end{aligned}$$
(6)
where \(m^{p}\) is the mass of material point p. Also, the control point values of the internal and external forces are written as, respectively,
$$\begin{aligned} \begin{aligned}&[{\varvec{F}}_{\textrm{int}, \, i}]_{\alpha }=-\sum _{p=1}^{N^\text {mp}}\frac{\partial N_{\alpha }({\varvec{x}}^{p})}{\partial x_j}\sigma _{ij}\Omega ^{p}, \end{aligned} \end{aligned}$$
(7)
$$\begin{aligned} \begin{aligned}&[{\varvec{F}}_{\textrm{ext}, \, i}]_{\alpha }=\sum _{p=1}^{N^\text {mp}}N_{\alpha }({\varvec{x}}^{p})m^{p}b_i+\int _{\partial \Omega }N_{\alpha }({\varvec{x}})t_i({\varvec{x}})d\Gamma . \end{aligned} \end{aligned}$$
(8)
The semi-discrete equations derived above are discretized in time, and the numerical solution is obtained at a current time. In what follows, the Lth discrete time is denoted by \(t^L\), and the time increment to the \(L+1\)th time is denoted by \(\Delta t\equiv t^{L+1}-t^{L}\).
The time-discretized form of the momentum balance equation in Eq. (5) can be written as
$$\begin{aligned} {\varvec{M}}^{L+\theta }{\varvec{a}}_{i}^{L+\theta }={\varvec{F}}_{\textrm{int}, \, i}^{L+\theta }+{\varvec{F}}_{\textrm{ext}, \, i}^{L+\theta }, \end{aligned}$$
(9)
to solve the \({\varvec{a}}_{i}^{L}\) explicitly with \(\theta =0\). After the control point acceleration \({\varvec{a}}_{i}^{L}\) is obtained, the control point velocity vector in the \(x_i\)-direction is updated as
$$\begin{aligned} {\varvec{v}}_{i}^{L+1}={\varvec{v}}_{i}^{L}+\Delta t{\varvec{a}}_{i}^{L}, \end{aligned}$$
(10)
the velocity and position vectors at each material point are updated as follows:
$$\begin{aligned}&v_{i}^{p, \, L+1}=v_{i}^{p, \, L}+\Delta t\sum _{\alpha =1}^{N^\text {n}} a_{i\alpha }^{L}N_{\alpha }({\varvec{x}}^{p, \, L}), \end{aligned}$$
(11)
$$\begin{aligned}&x_{i}^{p, \, L+1}=x_{i}^{p, \, L}+\Delta t\sum _{\alpha =1}^{N^\text {n}}v_{i\alpha }^{L+1}N_{\alpha }({\varvec{x}}^{p, \, L}). \end{aligned}$$
(12)
Prior to updating the state variables at each material point, we adopt the MUSL procedure [24] to “refine” the control point velocity vector. That is, the following mapping process is additionally performed:
$$\begin{aligned} {\varvec{v}}_{i}^{L+1}=\sum _{p=1}^{N^\text {mp}}\frac{m^{p, \, L} \, v_i^{p, \, L+1} \, N_{\alpha }({\varvec{x}}^{p, \, L})}{m^{p, \, L} \, N_{\alpha }({\varvec{x}}^{p, \, L})}, \end{aligned}$$
(13)
Then, the increment of the deformation gradient at material point p is evaluated as
$$\begin{aligned} \Delta F_{ij}^{p, \, L+1}=\delta _{ij}+\Delta t \sum _{\alpha =1}^{N^\text {n}}\frac{\partial N_{\alpha }({\varvec{x}}^{p, \, L})}{\partial x_j}v_{i \alpha }^{L+1}, \end{aligned}$$
(14)
and its determinant \(\Delta J^{p, \, L+1}=\det \Delta {\varvec{F}}^{p, \, L+1}\) is used to update its volume as
$$\begin{aligned} \Omega ^{p, \, L+1}&=\Delta J^{p, \, L+1} \Omega ^{p, \, L}. \end{aligned}$$
(15)

2.3 Numerical algorithm

The numerical algorithm for the MPM with the explicit time integration scheme is summarized as follows:
(i)
Map the information carried by material points to the background grid. The control point values of mass and velocity, \(M_{\alpha }^{L}\) and \({\varvec{v}}_{i}^{L}\), are calculated using Eqs. (6) and (13), respectively;
 
(ii)
Solve Eq. (9) for the control point acceleration vector \({\varvec{a}}_i^{L}\);
 
(iii)
Update the velocity and position vectors at material points using Eqs. (11) and (12), respectively;
 
(iv)
Following the procedure provided in Appendix 1, update the Cauchy stresses \(\sigma _{ij}^{L+1}\) by applying the return mapping algorithm, and then compute the logarithmic elastic strain \(\varepsilon _{ij}^{\textrm{e}, L+1}\) at the material points as
$$\begin{aligned} \varepsilon _{ij}^{\textrm{e}, L+1} = \frac{s_{ij}^{L+1}}{2G} + \frac{{p}^{L+1}}{3K}\delta _{ij}, \end{aligned}$$
(16)
where G and K are the shear and the bulk modulus of elasticity, respectively.
 
(v)
Initialize the computational grid for the next time step.
 

3 HPC technique for MPM

A new dynamic load balancing (DLB) method has been developed to realize MPI parallel computing for the numerical analysis of MPMs using B-spline basis functions. This adaptively changes the domain decomposition pattern so that the CPU load is equalized as material points are moved. Indeed, DLB is straightforward and has some previous studies for the MPM employing GIMP, which is a linear interpolation and has only \(C^1\) continuity on cell boundary [3840]. It should be noted that this study primarily focuses on the DLB method incorporated into the MPM using quadratic B-spline basis functions, and we have not compared it with the MPM with GIMP basis functions since the GIMP scheme is not implemented into our program code. Therefore, this must be open to further discussion. Nevertheless, this is the first study to ensure proper communication during parallelization when using B-spline basis functions, which has various advantages over GIMP.

3.1 Dynamic load balancing (DLB) technique

In conventional parallel computing, it is common practice to retain the initial domain decomposition, which is suitable for Eulerian mesh-based methods. However, MPM necessitates consideration of not only control point calculations but also the movement of material points between partitioned domains. As illustrated in Fig. 1, the initial mesh partitioning appears to be balanced, but can become unbalanced if material points move significantly over time. In the example in this figure, as the calculation progresses, the material points move and become concentrated in the domains assigned to cores 1 and 2, leaving the minimal number of material points required for the calculation in the domains assigned to cores 3 and 4. Consequently, the calculation time for cores 1 and 2 significantly exceeds that of cores 3 and 4, resulting in a substantial disparity in computing efficiency. This imbalance prevents the full benefits of parallel computing from being exploited. To address this issue and mitigate the imbalance in computation and communication as much as possible, especially in scenarios involving large-scale and long-duration phenomena, we adopted the dynamic load balancing (DLB) technique proposed by [47], which dynamically changes the domain decomposition as needed to ensure that particles are equally distributed to all subdomains. The implementation was initially developed for the gravitational octree code GOTHIC [48, 49] and adjusted for the MPM in this study.
In domain decomposition in particle methods such as SPH, there are generally no constraints on the shape of each subdomain. However, in space-filling curve (SFC)-based decomposition, bumpy surfaces between the decomposed subdomains are a natural output and are unsuitable for grid-based calculations in MPM. Indeed, since equation solving and variable communication in MPM take place at control points (or nodes) that make up the grid, it is preferable for each subdomain to be a rectangular in shape.
The overall algorithm for dynamically dividing the entire domain into rectangular subdomains and then performing parallel computations is outlined in Algorithm 1 in the form of pseudo code, and the overall flowchart is provided in Appendix 1. Here, the DLB explained corresponds to Lines A2-1 and A2-2, while the MPM explained in Sect. 2 corresponds to Lines A0, A1 and A3. In Line A4, particle sorting is performed according to the Morton order to increase the cache-hit rate [48]. Among these operations, Line A2-1 is the core operation of DLB and is executed to determine subdomains based on the sampling method. Algorithm 2 describes its detailed procedures and will be explained in the following. The assumption is that the whole domain is initially divided into \(N_x^\text {sub}\), \(N_y^\text {sub}\) and \(N_z^\text {sub}\) subdomains in the x, y and z directions respectively, and that the number of these divisions is fixed throughout the simulation. Since the movement of particles in a single time step is not supposed to be very large, we only need to apply the operations for DLB once for multiple time steps (corresponding to a fixed time interval). That is, DLB is performed only when mod(\(t^L\), \(t^\textrm{int}_\textrm{DLB}\)) = 0 where \(t^L\) is the current time and \(t^\textrm{int}_\textrm{DLB}\) is the time interval. In other time steps, we only communicate information of particles, which cross boundaries of subdomains. In DLB time step, the pattern of domain decomposition is determined by the sampling method described in Algorithm 2.
A specific task in Algorithm 2 is to decide where the boundaries of the subdomains should be defined according to the computational load of each MPI process. To do this, the number of sampled particles in the ith MPI process is first calculated by the following formula:
$$\begin{aligned} {n_{i}^\textrm{samp}} = N_\textrm{tot} R f_i, \end{aligned}$$
(17)
where \(N_\textrm{tot}\) is the total number of particles in the entire domain and R is the global sampling rate. Also, \(f_i\) is the correction factor for ith MPI process and is can be calculated as,
$$\begin{aligned} f_i = \frac{t_i}{\sum _j{t_j}}, \end{aligned}$$
(18)
which is equivalent to \(N_i/N_\textrm{tot}\) with \(N_i\) being the number of particles in the ith subdomain. where \(t_i\) is the CPU time measured on the ith MPI process.
First, for each subdomain, particles are sampled every \(N_i / n_{i}^\textrm{samp}\) in local ID order, and once sampling has been completed for all subdomains, the sampled particles are sorted in each process (subdomain) in order of decreasing x-coordinate. Second, the root process, which is assigned to a designated subdomain, gathers the number and the x-coordinates of sampled particles from all other processes. Third, the total number of sampled particles, \(n^\text {samp}\), is calculated. Fourth, the values calculated from \(n_x^\text {new}=n^\text {samp}/N_x^\text {sub}\) determine the approximate number of sampled particles in new subdomains to be generated in the x-direction only. Fifth, in the root process, the minimum (left-hand boundary position) and maximum (right-hand boundary position) of the x-coordinates of each new subdomain are determined according to this number \(n_x^\text {new}\). This means that the whole domain is divided into \(N_x^\text {sub}\) subdomains in the x-direction only, and the new subdomain should have approximately the same number of sampled particles with respect to the x-axis. Here, the last particle ID of the \(i_x\)th subdomain is determined from \(n^\textrm{samp}\times ((i_{x}+1)/n_x^\textrm{sub})\), and the boundary between the \(i_x\)th and \(i_{x+1}\)th subdomains is identified as being between the “last particle ID” and the “last particle ID”+1. The same operation is done for the y- and z-directions separately and determine the boundaries of subdomains in each direction. It should be noted that when dividing in the y-direction, the sampled particles in each subdomain generated in the x-direction only are sorted in order of decreasing y-coordinate. Note that each subdomain boundary is rounded to the nearest control point to facilitate linkage with the grid-based calculation of the MPM.
After determining subdomains in all directions, we reassign the information of cells, control points and particles belonging to each subdomain to an associated process. In this way, the computational load can be shared almost equally among all processes. To further aid in understanding Algorithm 2, the procedure is illustrated in Fig. 2 for the case where the whole domain is divided into 4, 3, and 5 subdomains in the x, y, and z directions, respectively. Thus, the boundaries of adjacent rectangular-shaped subdomains may consequently diverge from each other, which complicates communication in the B-spline based MPM as discussed below, and is highlighted as a novelty of this study.

3.2 Communication between cores using B-spline basis functions

As mentioned in Sect. 2.2, in this study, quadratic B-spline basis functions are used for the spatial discretization of the MPM. As illustrated in Fig. 3, when using B-spline basis functions, the domain decomposition in units of the cells that make up the mesh and the location of the control points associated with the cells do not necessarily coincide. Each variable is assigned to a control point and is used for interpolating approximations within the cells of that variable. In particular, the primal variable in the governing equation, velocity, is updated at the control points using Eqs. (9) and (10). Therefore, care must be taken when mapping to control points at the boundaries of the divided regions. Specifically, in order to compute the variables at control points associated with the boundaries of the domain assigned to a core, the variables of material points belonging to other cores adjacent to that core are also needed. Therefore, it is necessary to communicate with material points that do not belong to a domain in order to pass values to control points on the boundaries between corresponding cores.
As an example, a two-dimensional physical mesh divided into four cores is shown in Fig. 4a. The black dots in the figure are control points used to interpolate and approximate variables in each subdomain, and the points further circled (double-circled-looking) in this figure are the communication target points. The B-spline basis functions associated with these control points require the mapping of physical quantities from material points belonging to several other subdomains as illustrated in Fig. 4b, since their domains of influence span different subdomains. In addition, when the DLB described earlier is applied, since the domain decomposition is done so that almost the same number of material points belong to each core, it is possible that the rectangular subdomains are arranged in alternating directions, as shown in Fig. 5a. In such cases, communication at the control points between cores assigned to different subdomains becomes more complicated; see Fig. 5b. For example, as shown in Fig. 5c, communication at the control point also requires the subregion corresponding to Core 1 to communicate with Cores 2 and 3 on the same boundary edge. In addition, due to the conditions specific to communication at the control points of the B-spline basis functions described above, even though the subdomains corresponding to Cores 1 and 4 are not adjacent, communication at the control points is also required between these cores.

4 Numerical examples

Several numerical examples are presented to verify the effectiveness of the proposed DLB technique for the MPM with B-spline basis functions and to demonstrate its performance with a view to application to a real-world disaster simulation. Although our method is material-independent and can be adapted to various constitutive laws, we exclusively assume soil as the material here and adopt an elastoplastic constitutive law based on the Drucker–Prager yielding criterion; see Appendix 1. In this study, all numerical examples are simulated on Wisteria/BDEC-01 Odyssey [50], a supercomputer system featuring compute nodes with 48-core Fujitsu A64FX processors (@2.2 GHz), and 32 GiB of memory per node. Note that in all the following calculations, \(R = 10^{-4}\) is used for the sampling rate described in Sect. 3.1.

4.1 Solid column collapse problem

We first verify the effectiveness of the DLB technique by simulating a benchmark problem, in which a quasi-3D solid column packed inside a rectangular container collapses as the constraints on the sides of the container are instantly released. Figure 6 shows the schematic diagram of the analysis model, in which the container is 6 m in length and 3 m in height. The solid column is covered by a background grid consisting of \(45,000 \, (=300\times 150\times 1)\) cubic cells, with 80,000 solid material points evenly spaced. Each cell has a side length of 0.02 m and contains \(16 \, (=4\times 4\times 1)\) solid material points. The material parameters used in this simulation are provided in Table 1. The non-slip condition is considered on the bottom surface, and the two sides and top surfaces are stress-free. Also, conditions apply to both sides of the xy-plane. The initial stresses in the soil are calculated by linearly increasing the gravity acceleration for 1 s. In this process, the soil is assumed to be elastic by giving sufficiently large cohesion. After that, setting the initial time at \(t=0.0\) s, a cohesion of 0 kPa is instantaneously given to all the material points. The real-time of the phenomenon covered by this simulation is 1 s, and the time increment is set at \(1.0\times 10^{-4}\) s throughout the calculation. In addition, the interval step for DLB \(t_\textrm{DLB}^\textrm{int}\) is set at 100 steps.
Table 1
Simulation parameters for quasi-3D solid column collapse problem
Parameter
Value
Solid density (\(\mathrm {kg/m^3}\))
\(\rho\)
1500
Young’s modulus (MPa)
E
10
Poisson’s ratio (–)
\(\nu\)
0.3
Internal friction angle (\(^\circ\))
\(\phi\)
30
Dilatancy angle (\(^\circ\))
\(\psi\)
0
Cohesion (kPa)
c
0
Gravity acceleration (\(\mathrm {m/s^2}\))
g
\(-9.81\)
Figure 7 shows a snapshot of the process of collapse of the quasi-3D solid column at representative times (\(t =[0.0, 0.2, 0.4, 0.6, 0.8, 1.0]\) s). Here, the subdomains allocated to the 48 cores are shown color-coded. The upper side shows the simulation results without DLB (non-DLB case), and the lower side displays the results with the proposed DLB (DLB case). In the non-DLB case, the spatially fixed mesh is decomposed into \(8 \times 6\) subdomains, and the material points were distributed only in 8 cores at the initial stage (\(t=0.0\) s). This decomposition pattern remains the same throughout the simulation, causing severe load imbalances. That is, under the action of gravity acceleration, the soil column began to move downward and spread to both sides. Since the core distributions are fixed in the non-DLB case, the material point movement only occurs in a few cores as shown in Fig. 7a. Here, only a few colors appear, corresponding to the IDs of the cores. This is because material points are present only in the colored cores and not in the cores where no material points are present. Therefore, there is an imbalance throughout the calculation. On the other hand, in the case of DLB, only the cells where material points are present are subject to domain decomposition, so they are initially almost evenly distributed over the 48 cores. During the collapse process, the core distribution changes according to the material point distributions, so that the material point distributions remain approximately balanced among all cores until the end. This results in higher computational efficiency compared to the non-DLB case. Indeed, in this example (with 48 cores), the non-DLB analysis requires approximately 3529 s to complete the 1 s simulation, while the DLB analysis only took about 529 s. The calculation time of the former is approximately 6.67 times that of the latter.

4.2 Scalability and parallel efficiency for quasi-3D problem

To further explore the strong scaling and parallel efficiency of the proposed DLB for the MPM with B-spline basis functions, we conducted additional numerical tests utilizing 13 different core configurations for the same problem size. Up to calculations using 1, 6, 12, 24, and 36 cores, a single node is used; further calculations using 60, 72, 84, 96, 108, 120, 132, and 144 cores use multiple nodes. Note that node of the Wisteria/BDEC-01 has 48 cores.
The elapsed or, equivalently, wall-clock times for both non-DLB and DLB analyses are summarized in Fig. 8. Here, the parallel efficiency is detailed by dividing the elapsed times into the time spent on MPI communication and the time spent on processing other than MPI communication. Since there have been variations in the elapsed times of the core, only the minimum, maximum and average elapsed 2 times among all MPI processes are plotted in these figures. As can be seen from this figure, the average elapsed time for both analyses decreases as the number of cores increases, and the DLB analysis shows significantly higher performance compared to the non-DLB analysis. Here, the maximum “average elapsed time” ratio of the non-DLB case to that of the DLB case is approximately 7.15 with 36 cores, the minimum value with 12 cores is 4.18, and the average value excluding the single core cases is 5.95. As can be seen from this figure, the gap between the minimum and maximum elapsed times of the DLB case are much smaller than that of the non-DLB case. In other words, employing DLB significantly reduces the maximum elapsed time of communication and suppresses load bias among cores. Thus, it is clear that the proposed DLB technique offers significant advantages in improving computational efficiency.
Tables 2 and 3 compare the parallel efficiency between the cases with and without DLB on a single node and multiple nodes, respectively. As the number of cores increases, the parallel efficiency for the non-DLB analyses decreases significantly, maintaining a low efficiency between 10 and 20%. These results indicate poor scaling behavior when the computation is distributed across more cores without DLB. On the other hand, the implementation of the proposed DLB shows a substantial enhancement in efficiency. Even with the number of cores scaling up to 144, the parallel efficiency surpasses 55%. These results demonstrate the effectiveness of the proposed DLB in preserving high parallel efficiency even as the number of cores increases, thereby enhancing resource utilization.
As mentioned before, to reduce the cell-crossing error, the proposed method employs the quadratic B-spline basis functions instead of linear basis functions. This is directly related to the novelty of this study. Specifically, the control points associated with the quadratic B-spline basis functions belonging to one subdomain are not necessarily located on its boundary or on the cell boundaries belonging to it, but also outside of it. And the control points belonging to another adjacent subdomain that need to communicate with these control points are also located inside as well as on its boundary. This introduces complexity in parallel computation algorithms and dynamic load balancing implementations and was schematized in Figs. 4 and 5. Therefore, the proposed parallel scheme itself has no advantage over existing ones in terms of improved computational efficiency but has some advantage in terms of reducing cell-crossing error inherent in MPMs, which has been discussed in many studies; see, for example, Reference [44]. In other words, the advantage is not directly related to the new parallel scheme proposed in this study, but to the interpolation method.
In this context, it is worthwhile to compare the computational costs of using quadratic B-spline basis functions and linear basis functions for DLB-accelerated MPMs (hereafter referred to as quadratic and linear MPMs). We have tested simulating the same quasi-3D solid column collapse problem with the linear MPM and compared the elapsed time with that of the quadratic MPM simulations. Figure 9 shows the changes in the minimum, maximum and average elapsed times for the linear and quadratic MPMs when changing the numbers of cores. As before, the elapsed times were divided into two. From this figure, the elapsed times of the quadratic MPM are slightly longer than the linear MPM. Unlike the linear basis function, the quadratic B-spline basis function achieves \(C^1\) continuity in a cell, but at the cost of a larger number of active control points. This also results in longer elapsed times for communications and other operations. However, the maximum “average elapsed time” ratio of the quadratic case to that of the linear case is approximately 1.69, the average value excluding the single core cases is 1.57. Since the quadratic B-spline basis functions have a larger order and more degrees of freedom, the computation time with the quadratic MPM is naturally longer, but the order of time is comparable to that of the linear MPM. Judging from the above, including the superiority of the quadratic B-spline basis functions in terms of computational stability, the advantages of the proposed DLB-accelerated quadratic MPM in this study are obvious.
Table 2
Parallel efficiency on a single node for quasi-3D solid column collapse simulation
 
Number of cores
 
1
6
12
24
36
48
Non-DLB (%)
100
20.6
21.1
13.9
11.2
11.4
DLB (%)
100
94.5
91.2
87.3
82.8
78.6
Table 3
Parallel efficiency on multiple nodes for quasi-3D solid column collapse simulation
 
Number of cores
 
60
72
84
96
108
120
132
144
Non-DLB (%)
11.2
10.9
10.8
10.7
10.6
10.9
10.6
10.4
DLB (%)
76.8
73.2
70.8
67.9
66.8
62.2
61.2
58.2

4.3 Scalability and parallel efficiency for full-3D problem

To further confirm the performance of the proposed method, a full 3D simulation of a cylinder collapsing in a rectangular analysis domain was performed. Figure 10 a and b show the top and side views of the numerical model. The domain size is \(7.0\,\textrm{m}\,\times \,7.0\,\textrm{m}\,\times \,3.0\,\textrm{m}\), corresponding to the length, width, and height, respectively. The results of the non-DLB analysis are used as the reference solution for the DLB analysis. In both analyses, 288 cores are used in multiple nodes. The information about the prepared meshes and material points is summarized in Table 4. The simulation parameters are the same as in the above example. The non-slip condition is applied on the bottom surface. In this setting, material points are not supposed to reach the sidewall, so the boundary conditions there do not affect the results. The real-time of the phenomenon covered by this simulation is 1 s, and the time increment is set at \(1.0\times 10^{-4}\) s throughout the calculation. In addition, the interval step for DLB \(t_\textrm{DLB}^\textrm{int}\) is set at 1000 steps.
Table 4
Information of full 3D solid cylinder model
 
Number of meshes
\(9,408,000 \, (=280\times 280\times 120)\)
 
Size of mesh (\(\textrm{m}\))
\(0.025\times 0.025\times 0.025\)
 
Number of solid material points
2,712,959
 
Number of solid material points per cell
\(27 \, (=3\times 3\times 3)\)
 
Number of control points
\(9,701,928 \, (=282\times 282\times 122)\)
Figure 11a and b show the numerical results obtained by the non-DLB and DLB analyses, respectively. Here, the color corresponds to the core ID assigned to the subdomain As can be seen from these figures, both simulations show the same behavior. In the non-DLB case, the material points are only distributed in some cores during the calculation, while those in the DLB case are almost evenly distributed among all cores. The computational time for the non-DLB analysis requires about \({138,037}~\textrm{s} \approx ({38.3}~\textrm{h})\), whereas the DLB analysis only took about \({6,886}~\textrm{s} \approx ({1.91}~\textrm{h})\). The computation time for a DLB analysis is about 1/20 of that for a non-DLB analysis, indicating that the proposed DLB is highly computationally efficient. This highlights the potential applicability offered by the proposed DLB technique for large-scale simulations.
Considering the spatial scale of actual natural disasters, which can be hundreds or even thousands times larger than the experimental scale, the introduction of the proposed DLB technique for the B-spline based MPM is highly significant and expected to effectively handle large-scale landslides.

4.4 Full-scale landslide simulation

Using the performance of the proposed DLB technology, a landslide simulation are carried out on a full-scale virtual slope shown in Fig. 12. Unconsolidated soil with a high air content (the porosity is about 0.756) is assumed as the soil material comprising the slope.
To avoid unnecessarily large degrees of freedom, the number of material points is reduced by reference to the decay depth, as described below. First, based on the elevation, \(2 \times 2\) particles are placed in a 1 m cubic grid. Next, to reduce the number of material points, the top surface is divided into \(100\,\times \,100\) sub-regions (clustering the DEM data) in the x and z directions, and material points are placed from the cell with the smallest elevation point in each sub-region to the cell approximately 11 m below. Then, the top 8 m is the surface layer and below that is the base surface.
Finally, the boundary conditions are set as follows: in the vertical direction, the velocity of control points at the bottom is set to 0 m/s; in other directions, the velocity of control points on the base layer is set to 0 m/s. In addition, non-slip condition is applied on side edge of the analysis domain.
The material parameters are shown in Table 5, and the gravity acceleration g is \(-9.81\,\mathrm {m/s^2}\). \(2\times 2\times 2\) material points are placed in each cell with 1 m sides, and the simulation area is 450 m \(\times\) 125 m \(\times\) 320 m. Thus, the number of control points and particles were 18,484,008 and 19,150,323. In addition, the quadratic B-spline function is used as the basis function, and the time increment is set to \(\Delta t=1.0 \times 10^{-3}\) s. The interval step for DLB \(t_\textrm{DLB}^\textrm{int}\) is set at 5000 steps. To obtain the initial state of slope, a quasi-static analysis is conducted for 10,000 steps by increasing gravity from 0 to 9.8 m/s\(^2\). Moreover, as a hypothetical earthquake, the body forces in each direction are input as the acceleration of the sin curve of north–south, east–west and up-down components; amplitude is 600 gal and period is 1 Hz from 10 to 90 s. The simulation uses a Wisteria/BDEC-01 with \(48\,\times \,4\,\times \,30 = 5760\) CPU cores. The DLB is performed made every 5000 steps.
Figure 13 is a snapshot of the change in elevation distribution, where the basal layer, shown in brown, is in an immobile setting. This figure shows that the surface layer collapsed and flowed to the bottom due to the earthquake. This figure shows that the virtual earthquake shaking caused the surface layer to collapse and flow to the bottom. The time taken for the calculation was about 6.3 h (22,745 s) for the 100 s phenomenon. Also, the computational computation time required to simulate the first 10 s phenomenon with and without DLB was 2440 and 8924 s, respectively. Thus, the proposed MPM analysis with DLB is about 3.66 times faster than the analysis without DLB. The reason why not much significant computational speedup has been obtained is that there are fewer particles in the thickness direction than in the in-plane direction, and particle motion is relatively small. If the particle motion were larger in all directions, the proposed method would be more efficient.
Table 5
Simulation parameter for full-scale disaster simulation
Parameters
Surface layer
Base layer
Soil density (kg/m\(^3\))
\(\rho\)
617
1500
Young’s modulus (MPa)
E
4.0
\(1.0 \times 10^3\)
Poisson’s ratio (–)
\(\nu\)
0.4
0.3
Internal friction angle (\(^\circ\))
\(\phi\)
30.0
55.0
Dilatancy angle (\(^\circ\))
\(\psi\)
0.0
55.0
Cohesion (kPa)
c
5.0
300.0

5 Conclusions

In this study, a new DLB technique is developed for parallel MPI calculations based on domain decomposition in order to perform large landslide simulations in realistic computation time using MPM with B-spline basis functions. Despite using higher-order B-spline basis functions with a range of influence across cells compared to common basis functions, the proposed DLB technique dynamically adjusts the size of the computational domain according to the material point distribution, so that material points are almost equally distributed across all cores. This allowed the load bias between cores to be mitigated and the advantages of parallel computation to be fully exploited. Specifically, a novel contribution of this study is the measure to ensure proper communication between the control points, especially when the area of influence of the B-spline basis function spans multiple subdomains, although the physical domain to which the cores are allocated are staggered or non-adjacent due to the domain decomposition.
In a numerical example, the quasi-3D benchmark solid column collapse problem was tested on 14 different core configurations to verify the effectiveness of the proposed DLB method in terms of scalability and parallelization efficiency. Simulations of the full 3D column collapse problem also demonstrate the applicability of the developed DLB method to large-scale disaster simulation. Finally, to demonstrate the promise and capability of the DLB technique in the MPM algorithm, a simulation of a full-scale size landslide disaster was carried out, showing that the technique can withstand practical calculations.
In the future, we will work on the development of new parallel calculation methods, such as the introduction of the DLB method into the MPM-FEM hybrid method for landslide-triggered tsunami problems, the development of parallel calculation methods using different core distributions for MPM and FEM, and the realization of large-scale fluid calculations by introducing high-performance solvers into FEM.

Acknowledgements

This work was supported by JSPS KAKENHI (Grant Numbers: 23KJ0124, 22H00507 and 19H01094) and JST SPRING (Grant Number: JPMJSP2114). Also, the authors would like to thank the JHPCN project (Project IDs: jh220019 in 2022, jh230070 in 2023, and jh240040 in 2024) for allowing us to use the Wisteria/BDEC-01 Supercomputer System at the Information Technology Center, The University of Tokyo free of charge. In addition, the authors would like to thank Ryoichi Kimura of Tohoku University for his technical assistance with geometry modeling.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Anhänge

Appendix 1: Constitutive model

We employ a standard elasto-plastic constitutive model based on the multiplicative decomposition of the deformation gradient such that
$$\begin{aligned} F_{ij}=F_{ik}^\textrm{e}F_{kj}^\textrm{p}, \end{aligned}$$
(19)
where \(F_{ik}^\textrm{e}\) and \(F_{kj}^\textrm{p}\) are the elastic and plastic components, respectively. Using the deformation gradient, the logarithmic elastic strain in the space description \(\varepsilon _{ij}^\textrm{e}\) is represented as:
$$\begin{aligned} \varepsilon _{ij}^\textrm{e}=\frac{1}{2}\ln {b_{ij}^\textrm{e}} \end{aligned}$$
(20)
where \(b_{ij}^\textrm{e}\) is the left elastic Cauchy–Green deformation tensor defined as \(b_{ij}^\textrm{e}=F_{ik}^\textrm{e}F_{jk}^{\textrm{e}~T}\). The elastic response is represented by the Hencky hyperelastic model, in which the energy function is defined as:
$$\begin{aligned} \begin{aligned} \mathcal {H}(\lambda _{1}^\textrm{e},\lambda _{2}^\textrm{e},\lambda _{3}^\textrm{e})&= \mu \left[ (\ln {\lambda _{1}^\textrm{e}})^2+(\ln {\lambda _{2}^\textrm{e}})^2+(\ln {\lambda _{3}^\textrm{e}})^2\right] \\&\quad +\frac{\lambda }{2}(\ln {J^\textrm{e}})^2, \end{aligned} \end{aligned}$$
(21)
where \(\lambda _{\alpha }^\textrm{e} \, (\alpha =1,2,3)\) are the three principal stretches of the elastic component of the left-Cauchy–Green deformation tensor, \(\lambda , \, \mu\) are Lamé constant, and \(J^\textrm{e}\) is the determinant of \(F_{ik}^\textrm{e}\). Also, we use the Drucker–Prager yield criterion to represent the plastic deformation, which its plastic potential function is given as
$$\begin{aligned} \Phi =\sqrt{J_2(s_{ij})}+\eta p-\xi c, \end{aligned}$$
(22)
where \(s_{ij}\) is the deviatoric stress defined as \(s_{ij}=\sigma _{ij}-p\delta _{ij}\), p is the hydrostatic pressure component and \(J_2\) is the second invariant of the deviatoric stress tensor. Also, c is the cohesion, and \(\eta , \, \xi\) are material constants defined as
$$\begin{aligned}&\eta \equiv \frac{3\tan \phi }{\sqrt{9+12\tan ^2 \phi }}, \end{aligned}$$
(23)
$$\begin{aligned}&\xi \equiv \frac{3}{\sqrt{9+12\tan ^2 \phi }}, \end{aligned}$$
(24)
where \(\phi\) is the internal friction angle. In order to suppress excessive dilatancy, the non-associated flow rule is employed and the dilatancy angle \(\psi\) is set to zero in this study. The corresponding plastic potential to determine the specific flow rule is formed by setting the second and third terms on the right-hand side in Eq. (A4) to zeros.
In addition, when the tensile stress acts on the soil, the soil particles lose contact with each other and the soil lose the effective stress. To simulate this phenomenon, we adopt the following model [51]:
$$\begin{aligned} K = \left\{ \begin{array}{ll} K & \text{ if }~~~\varepsilon ^\textrm{e}_\textrm{v} < 0,\\ 0 & \text{ otherwise }, \end{array} \right. \end{aligned}$$
(25)
where \(\varepsilon ^\textrm{e}_\textrm{v}\) is the volumetric part of logarithmic elastic strain.

Appendix 2: Flow chart

This section presents a flowchart of the computation code for MPM incorporating DLB using quadratic B-spline basis functions (Fig. 14).
Literatur
1.
Zurück zum Zitat Yin Y, Wang F, Sun P (2009) Landslide hazards triggered by the 2008 Wenchuan earthquake, Sichuan, China. Landslides 6:139–152CrossRefMATH Yin Y, Wang F, Sun P (2009) Landslide hazards triggered by the 2008 Wenchuan earthquake, Sichuan, China. Landslides 6:139–152CrossRefMATH
2.
Zurück zum Zitat Cui Y, Bao P, Xu C, Ma S, Fu G (2021) Landslides triggered by the 6 September 2018 Mw 6.6 Hokkaido, Japan: an updated inventory and retrospective hazard assessment. Earth Sci Inform 14:247–258CrossRef Cui Y, Bao P, Xu C, Ma S, Fu G (2021) Landslides triggered by the 6 September 2018 Mw 6.6 Hokkaido, Japan: an updated inventory and retrospective hazard assessment. Earth Sci Inform 14:247–258CrossRef
3.
Zurück zum Zitat Tanyaş H, Hill K, Mahoney L, Fadel I, Lombardo L (2022) The world’s second-largest, recorded landslide event: Lessons learnt from the landslides triggered during and after the 2018 Mw 7.5 Papua New Guinea earthquake. Eng Geol 297:106504CrossRef Tanyaş H, Hill K, Mahoney L, Fadel I, Lombardo L (2022) The world’s second-largest, recorded landslide event: Lessons learnt from the landslides triggered during and after the 2018 Mw 7.5 Papua New Guinea earthquake. Eng Geol 297:106504CrossRef
4.
Zurück zum Zitat Fan X, Xu Q, Scaringi G, Dai L, Li W, Dong X, Zhu X, Pei X, Dai K, Havenith H-B (2017) Failure mechanism and kinematics of the deadly June 24th 2017 Xinmo landslide, Maoxian, Sichuan, China. Landslides 14:2129–2146CrossRef Fan X, Xu Q, Scaringi G, Dai L, Li W, Dong X, Zhu X, Pei X, Dai K, Havenith H-B (2017) Failure mechanism and kinematics of the deadly June 24th 2017 Xinmo landslide, Maoxian, Sichuan, China. Landslides 14:2129–2146CrossRef
5.
Zurück zum Zitat Zhang Y, Li D, Yin K, Chen L, Xu Y, Woldai T, Lian Z (2018) The July 1, 2017 Wangjiawan landslide in Ningxiang County, China. Landslides 15:1657–1662CrossRefMATH Zhang Y, Li D, Yin K, Chen L, Xu Y, Woldai T, Lian Z (2018) The July 1, 2017 Wangjiawan landslide in Ningxiang County, China. Landslides 15:1657–1662CrossRefMATH
6.
Zurück zum Zitat Huang Y, Zhang W, Xu Q, Xie P, Hao L (2012) Run-out analysis of flow-like landslides triggered by the Ms 8.0 2008 Wenchuan earthquake using smoothed particle hydrodynamics. Landslides 9:275–283CrossRef Huang Y, Zhang W, Xu Q, Xie P, Hao L (2012) Run-out analysis of flow-like landslides triggered by the Ms 8.0 2008 Wenchuan earthquake using smoothed particle hydrodynamics. Landslides 9:275–283CrossRef
7.
Zurück zum Zitat Dai Z, Huang Y, Cheng H, Xu Q (2014) 3D numerical modeling using smoothed particle hydrodynamics of flow-like landslide propagation triggered by the 2008 Wenchuan earthquake. Eng Geol 180:21–33CrossRef Dai Z, Huang Y, Cheng H, Xu Q (2014) 3D numerical modeling using smoothed particle hydrodynamics of flow-like landslide propagation triggered by the 2008 Wenchuan earthquake. Eng Geol 180:21–33CrossRef
8.
Zurück zum Zitat Pastor M, Blanc T, Haddad B, Petrone S, Sanchez Morles M, Drempetic V, Issler D, Crosta G, Cascini L, Sorbino G, Cuomo S (2014) Application of a SPH depth-integrated model to landslide run-out analysis. Landslides 11:793–812CrossRefMATH Pastor M, Blanc T, Haddad B, Petrone S, Sanchez Morles M, Drempetic V, Issler D, Crosta G, Cascini L, Sorbino G, Cuomo S (2014) Application of a SPH depth-integrated model to landslide run-out analysis. Landslides 11:793–812CrossRefMATH
9.
Zurück zum Zitat Zhang W, Shi C, An Y, Yang S, Liu Q (2019) Viscous Elastoplastic SPH Model for Long-Distance High-Speed Landslide. Int J Comput Methods 16(02):1846011MathSciNetCrossRefMATH Zhang W, Shi C, An Y, Yang S, Liu Q (2019) Viscous Elastoplastic SPH Model for Long-Distance High-Speed Landslide. Int J Comput Methods 16(02):1846011MathSciNetCrossRefMATH
10.
Zurück zum Zitat Peng C, Li S, Wu W, An H, Chen X, Ouyang C, Tang H (2022) On three-dimensional SPH modeling of large-scale landslides. Can Geotech J 59(1):24–39CrossRefMATH Peng C, Li S, Wu W, An H, Chen X, Ouyang C, Tang H (2022) On three-dimensional SPH modeling of large-scale landslides. Can Geotech J 59(1):24–39CrossRefMATH
11.
Zurück zum Zitat Monforte L, Arroyo M, Carbonell JM, Gens A (2017) Numerical simulation of undrained insertion problems in geotechnical engineering with the Particle Finite Element Method (PFEM). Comput Geotech 82:144–156CrossRefMATH Monforte L, Arroyo M, Carbonell JM, Gens A (2017) Numerical simulation of undrained insertion problems in geotechnical engineering with the Particle Finite Element Method (PFEM). Comput Geotech 82:144–156CrossRefMATH
12.
Zurück zum Zitat Zhang X, Krabbenhoft K, Pedroso DM, Lyamin AV, Sheng D, da Silva MV, Wang D (2013) Particle finite element analysis of large deformation and granular flow problems. Comput Geotech 54:133–142CrossRefMATH Zhang X, Krabbenhoft K, Pedroso DM, Lyamin AV, Sheng D, da Silva MV, Wang D (2013) Particle finite element analysis of large deformation and granular flow problems. Comput Geotech 54:133–142CrossRefMATH
13.
Zurück zum Zitat Zhang X, Krabbenhoft K, Sheng D, Li W (2015) Numerical simulation of a flow-like landslide using the particle finite element method. Comput Mech 55:167–177MathSciNetCrossRefMATH Zhang X, Krabbenhoft K, Sheng D, Li W (2015) Numerical simulation of a flow-like landslide using the particle finite element method. Comput Mech 55:167–177MathSciNetCrossRefMATH
14.
Zurück zum Zitat Zhang Y, Zhang X, Nguyen H, Li X, Wang L (2023) An implicit 3D nodal integration based PFEM (N-PFEM) of natural temporal stability for dynamic analysis of granular flow and landslide problems. Comput Geotech 159:105434CrossRef Zhang Y, Zhang X, Nguyen H, Li X, Wang L (2023) An implicit 3D nodal integration based PFEM (N-PFEM) of natural temporal stability for dynamic analysis of granular flow and landslide problems. Comput Geotech 159:105434CrossRef
15.
Zurück zum Zitat Dunatunga S, Kamrin K (2015) Continuum modeling and simulation of granular flows through their many phases. J Fluid Mech 779:483–513MathSciNetCrossRefMATH Dunatunga S, Kamrin K (2015) Continuum modeling and simulation of granular flows through their many phases. J Fluid Mech 779:483–513MathSciNetCrossRefMATH
16.
Zurück zum Zitat Xu X, Jin F, Sun Q, Soga K, Zhou GGD (2019) Three-dimensional material point method modeling of runout behavior of the Hongshiyan landslide. Can Geotech J 56(9):1318–1337CrossRefMATH Xu X, Jin F, Sun Q, Soga K, Zhou GGD (2019) Three-dimensional material point method modeling of runout behavior of the Hongshiyan landslide. Can Geotech J 56(9):1318–1337CrossRefMATH
17.
Zurück zum Zitat Zhang W, Wu Z, Peng C, Li S, Dong Y, Yuan W (2024) Modelling large-scale landslide using a GPU-accelerated 3D MPM with an efficient terrain contact algorithm. Comput Geotech 158:105411CrossRef Zhang W, Wu Z, Peng C, Li S, Dong Y, Yuan W (2024) Modelling large-scale landslide using a GPU-accelerated 3D MPM with an efficient terrain contact algorithm. Comput Geotech 158:105411CrossRef
18.
Zurück zum Zitat Yerro A, Soga K, Bray J (2019) Runout evaluation of Oso landslide with the material point method. Can Geotech J 56(9):1304–1317CrossRefMATH Yerro A, Soga K, Bray J (2019) Runout evaluation of Oso landslide with the material point method. Can Geotech J 56(9):1304–1317CrossRefMATH
19.
Zurück zum Zitat Ying C, Zhang K, Wang Z-N, Siddiqua S, Makeen GMH, Wang L (2021) Analysis of the run-out processes of the Xinlu village landslide using the generalized interpolation material point method. Landslides 18:1519–1529CrossRefMATH Ying C, Zhang K, Wang Z-N, Siddiqua S, Makeen GMH, Wang L (2021) Analysis of the run-out processes of the Xinlu village landslide using the generalized interpolation material point method. Landslides 18:1519–1529CrossRefMATH
20.
Zurück zum Zitat Pan S, Nomura R, Ling G, Takase S, Moriguchi S, Terada K (2024) Variable passing method for combining 3D MPM-FEM hybrid and 2D shallow water simulations of landslide-induced tsunamis. Int J Numer Methods Fluids 96(1):17–43MathSciNetCrossRefMATH Pan S, Nomura R, Ling G, Takase S, Moriguchi S, Terada K (2024) Variable passing method for combining 3D MPM-FEM hybrid and 2D shallow water simulations of landslide-induced tsunamis. Int J Numer Methods Fluids 96(1):17–43MathSciNetCrossRefMATH
22.
Zurück zum Zitat Sulsky D, Chen Z, Schreyer HL (1994) A particle method for history-dependent materials. Comput Methods Appl Mech Eng 118:179–196MathSciNetCrossRefMATH Sulsky D, Chen Z, Schreyer HL (1994) A particle method for history-dependent materials. Comput Methods Appl Mech Eng 118:179–196MathSciNetCrossRefMATH
23.
Zurück zum Zitat Harlow FH, Welch JE (1965) Numerical calculation of time-dependent viscous incompressible flow of fluid with free surface. Phys Fluids 8(12):2182–2189MathSciNetCrossRefMATH Harlow FH, Welch JE (1965) Numerical calculation of time-dependent viscous incompressible flow of fluid with free surface. Phys Fluids 8(12):2182–2189MathSciNetCrossRefMATH
24.
Zurück zum Zitat Sulsky D, Zhou S-J, Schreyer HL (1995) Application of a particle-in-cell method to solid mechanics. Comput Phys Commun 87:236–252CrossRefMATH Sulsky D, Zhou S-J, Schreyer HL (1995) Application of a particle-in-cell method to solid mechanics. Comput Phys Commun 87:236–252CrossRefMATH
25.
Zurück zum Zitat Brackbill JU, Ruppel HM (1986) FLIP: A method for adaptively zoned, particle-in-cell calculations of fluid flows in two dimensions. J Comput Phys 65(2):314–343MathSciNetCrossRefMATH Brackbill JU, Ruppel HM (1986) FLIP: A method for adaptively zoned, particle-in-cell calculations of fluid flows in two dimensions. J Comput Phys 65(2):314–343MathSciNetCrossRefMATH
27.
Zurück zum Zitat Pan S, Yamaguchi Y, Suppasri A, Moriguchi S, Terada K (2021) MPM-FEM hybrid method for granular mass-water interaction problems. Comput Mech 68:155–173MathSciNetCrossRefMATH Pan S, Yamaguchi Y, Suppasri A, Moriguchi S, Terada K (2021) MPM-FEM hybrid method for granular mass-water interaction problems. Comput Mech 68:155–173MathSciNetCrossRefMATH
28.
Zurück zum Zitat Zhu G, Hughes J, Zheng S, Greaves D (2023) A novel MPI-based parallel smoothed particle hydrodynamics framework with dynamic load balancing for free surface flow. Comput Phys Commun 284:108608CrossRefMATH Zhu G, Hughes J, Zheng S, Greaves D (2023) A novel MPI-based parallel smoothed particle hydrodynamics framework with dynamic load balancing for free surface flow. Comput Phys Commun 284:108608CrossRefMATH
29.
Zurück zum Zitat Egorova MS, Dyachkov SA, Parshikov AN, Zhakhovsky VV (2019) Parallel SPH modeling using dynamic domain decomposition and load balancing displacement of Voronoi subdomains. Comput Phys Commun 234:112–125CrossRefMATH Egorova MS, Dyachkov SA, Parshikov AN, Zhakhovsky VV (2019) Parallel SPH modeling using dynamic domain decomposition and load balancing displacement of Voronoi subdomains. Comput Phys Commun 234:112–125CrossRefMATH
30.
Zurück zum Zitat Fox GC (1988) A graphical approach to load balancing and sparse matrix vector multiplication on the hypercube. In: Schultz M (eds) Numerical algorithms for modern parallel computer architectures. Springer US, New York, pp 37–61 Fox GC (1988) A graphical approach to load balancing and sparse matrix vector multiplication on the hypercube. In: Schultz M (eds) Numerical algorithms for modern parallel computer architectures. Springer US, New York, pp 37–61
34.
35.
Zurück zum Zitat Ishiyama T, Nitadori K, Makino J (2012) 4.45 Pflops astrophysical \(N\)-body simulation on K computer: the gravitational trillion-body problem. In: Hollingsworth JK (ed.) SC Conference on High Performance Computing Networking, Storage and Analysis, SC ’12, Salt Lake City, UT, USA - November 11-15, pp 1–10. https://doi.org/10.1109/SC.2012.3 Ishiyama T, Nitadori K, Makino J (2012) 4.45 Pflops astrophysical \(N\)-body simulation on K computer: the gravitational trillion-body problem. In: Hollingsworth JK (ed.) SC Conference on High Performance Computing Networking, Storage and Analysis, SC ’12, Salt Lake City, UT, USA - November 11-15, pp 1–10. https://​doi.​org/​10.​1109/​SC.​2012.​3
36.
Zurück zum Zitat Bédorf J, Gaburov E, Fujii MS, Nitadori K, Ishiyama T, Portegies Zwart S (2014) 24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs. In: Damkroger T, Dongarra JJ (eds) International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, New Orleans, LA, USA, November 16-21, pp 54–65. https://doi.org/10.1109/SC.2014.10 Bédorf J, Gaburov E, Fujii MS, Nitadori K, Ishiyama T, Portegies Zwart S (2014) 24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs. In: Damkroger T, Dongarra JJ (eds) International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2014, New Orleans, LA, USA, November 16-21, pp 54–65. https://​doi.​org/​10.​1109/​SC.​2014.​10
37.
Zurück zum Zitat Ishiyama T, Yoshikawa K, Tanikawa A (2022) High Performance Gravitational N-body Simulations on Supercomputer Fugaku. In: HPC Asia 2022: international conference on high performance computing in Asia-Pacific Region, Virtual Event, Japan, January 12–14, pp 10–17. https://doi.org/10.1145/3492805.3492816 Ishiyama T, Yoshikawa K, Tanikawa A (2022) High Performance Gravitational N-body Simulations on Supercomputer Fugaku. In: HPC Asia 2022: international conference on high performance computing in Asia-Pacific Region, Virtual Event, Japan, January 12–14, pp 10–17. https://​doi.​org/​10.​1145/​3492805.​3492816
45.
Zurück zum Zitat Steffen M, Kirby RM, Berzins M (2008) Analysis and reduction of quadrature errors in the material point method (MPM). Int J Numer Methods Eng 76(6):922–948MathSciNetCrossRefMATH Steffen M, Kirby RM, Berzins M (2008) Analysis and reduction of quadrature errors in the material point method (MPM). Int J Numer Methods Eng 76(6):922–948MathSciNetCrossRefMATH
46.
Zurück zum Zitat Moutsanidis G, Long C, Bazilevs Y (2020) IGA-MPM: the isogeometric material point method. Comput Methods Appl Mech Eng 372:113346MathSciNetCrossRefMATH Moutsanidis G, Long C, Bazilevs Y (2020) IGA-MPM: the isogeometric material point method. Comput Methods Appl Mech Eng 372:113346MathSciNetCrossRefMATH
Metadaten
Titel
B-spline-based material point method with dynamic load balancing technique for large-scale simulation
verfasst von
Soma Hidano
Shaoyuan Pan
Keina Yoshida
Reika Nomura
Yohei Miki
Masatoshi Kawai
Shuji Moriguchi
Kengo Nakajima
Kenjiro Terada
Publikationsdatum
07.01.2025
Verlag
Springer London
Erschienen in
Engineering with Computers
Print ISSN: 0177-0667
Elektronische ISSN: 1435-5663
DOI
https://doi.org/10.1007/s00366-024-02099-4