Parallel direct Poisson solver for discretisations with one Fourier diagonalisable direction
Introduction
Direct numerical simulation (DNS) and large-eddy simulation (LES) of incompressible turbulent flows demand huge computing power and need parallel computers to be feasible. The Poisson equation, which arises from the incompressibility constraint and has to be solved at least once per time step, is usually the most time-consuming and difficult to parallelise part of the algorithm. Therefore, in this context the development of efficient and scalable Poisson solvers is of great interest.
Time-accurate DNS/LES simulations generally demand a large amount of time-steps (for DNS applications it can reach ∼106). If the mesh does not change during the simulation, the Poisson equation is solved repeatedly with different right-hand-side terms while the system matrix remains constant. In these cases, a pre-processing phase with relatively large computing demands can be accepted. Another usual feature in DNS/LES applications is to have at least one periodic homogeneous direction in the flow. This property makes the Fourier diagonalisation [1], [2] in the periodic direction(s) the best choice. The uniformity of the grid in these directions, imposed by the method, is suitable with the isotropic nature of the flow on them. In this work, we restrict ourselves to problems with only one periodic homogeneous direction, examples of this kind of configuration can be found in [3], [4], [5], [6]. In order to deal with complex geometries, no restrictions are imposed for the non-periodic spatial directions. Therefore, the method here proposed is suitable for extruded 2D unstructured meshes.
Different strategies can be used to solve the set of 2D systems resulting from the diagonalisation. The most widespread iterative options are the conjugate gradient (CG) [7], [8], and the multigrid [9], [10] methods. The former is easy to implement and shows good scalability. However, its performance is strongly dependent on the spectral condition number and usually requires an application-specific preconditioner [11]. On the other hand, the convergence rate of multigrid methods does not depend on the problem size and they work very well for certain problems. However, they also require a careful application-specific tuning. Alternatively, the performance of direct Schur-complement based algorithms [12], [13], [14], [15], irrespective of the condition number or the specific application, only depends on the sparsity pattern of the matrix. As a main drawback, they require a computationally demanding pre-processing phase and large memory resources. Nevertheless, as mentioned above, this additional pre-processing cost becomes almost negligible for the time-accurate applications here considered. And, since the set of systems are 2D, the memory requirements remain still feasible for the range of mesh sizes required for DNS/LES application.
In this paper, we propose a direct Schur complement based decomposition (DSD) method for the set of mutually independent 2D systems. Essentially, Schur complement based algorithms may differ in three aspects: (i) the determination of the interface that separates subdomains, (ii) the solver of the Schur complement (or interface) system, and (iii) the solver of the inner systems. The prevailing option is to solve the interface system by means of a preconditioned Krylov projection method [8], [16], [17]. In this case, it is common to use a double-sided interface, composed of the nodes of each subdomain which are linearly coupled with nodes of other subdomains. With this strategy, there are at least two interface mesh-lines between each pair of subdomains. This approach is convenient because the interface can be set locally, and the resulting block structure of the Schur complement matrix is well determined a priori [16]. On the other hand, in some situations, usually with relatively low numbers of CPUs, it can be more convenient to use a direct solver for the interface system. To do so, in some works [13], [18], [19], it is solved by means of a parallel factorisation or gathered into a master process to solve it sequentially. Another possible approach is to explicitly compute the inverse of the Schur complement matrix and distribute it among the processing elements [12], [20]. However, due to the high memory requirements of direct solvers, a one-sided interface strategy (a single mesh-line separating the subdomains) is more convenient. In this way, the resulting interface size is approximately halved.
In our previous works [12], [14], a DSD algorithm in conjunction with a fast Fourier transform (FFT) was successfully used to perform DNS/LES simulations on structured Cartesian grids [3], [21]. The interface was naturally built in a balanced manner, a band LU [22] was used as local solver and the inverse of the Schur complement matrix was calculated and stored in parallel. The distribution of data between the parallel processes was carried out in such a way that only a collective communication was needed in the solution phase. Although the latter implied some additional duplication of data, this approach was very suitable for parallel systems with a high latency network. Here, the extension of the DSD algorithm to general unstructured grids and modern supercomputers is presented. To do so, modification of the three above-mentioned points have been introduced. Namely, (i) a new algorithm to balance the one-sided interface, (ii) the local solver has been replaced by a sparse Cholesky factorisation [22], and finally, (iii) the Schur complement matrix is now treated sparsely instead of as a dense matrix. Besides, the distribution of data between the parallel processes has been modified in order to decrease the memory costs. This requires two additional point-to-point communications, which cost is almost negligible on modern supercomputers. All these changes have significantly accelerated the set-up and solve phases of the algorithm.
With regard to the overall parallelisation strategy, some important improvements have been introduced. In our previous works [11], [14], the same partition was used for both the physical and spectral spaces. This was a convenient approach for relatively low numbers of CPUs. However, this approach eventually limits the number of parallel processes in the periodic direction. Here, in order to overcome this drawback, the physical and spectral partitions can be chosen independently. With this new strategy, the range of efficient scalability has been significantly increased: maximal tests with 128 parallel processes in the periodic direction show that the scalability is still not exhausted.
All the numerical tests have been carried out on the IBM MareNostrum supercomputer at the Barcelona Supercomputing Center. Computing times and speed-up tests obtained for meshes up to 109 nodes using 8192 CPUs, illustrate the robustness and scalability of the method. Moreover, a direct comparison of the DSD as 2D solver with two preconditioned conjugate gradient PCG [8] methods is also presented and discussed.
The rest of the paper is arranged as follows. In Section 2, the numerical methods for the time-accurate solution of Navier–Stokes equations are briefly described. The Poisson solver is presented in Section 3. The parallelisation strategy is discussed in Section 4. In Section 5, numerical experiments are carried out to test the performance of the proposed parallel solver on the MareNostrum supercomputer. In Section 6, the DSD is successfully compared with the PCG in the context of DNS simulations of the flow around a circular cylinder at Re = 3900 and Re = 10,000. Finally, relevant results are summarised and conclusions are given in Section 7.
Section snippets
Geometry discretisation
In this work, geometric discretisations obtained by the uniform extrusion of generic 2D meshes are considered. Periodic boundary conditions are imposed in the extrusion direction, thus the linear couplings of the Poisson equation in such a direction result into circulant submatrices. Since the proposed algorithm does not impose any restriction for the initial 2D mesh, it is suitable for unstructured meshes. Nevertheless, this lack of structure leads to a more complex data management. An
Direct Poisson solver
In this section, the algorithm to solve the Poisson equation is described. As mentioned above, the problem under consideration readswhere the Laplacian operator, L, remains constant during all the simulation and Nt is the total number of time-steps. Under these conditions, the computational cost (per time-step) of any preprocessing phase is reduced by Nt times. Typically, for DNS applications Nt = 105 ∼ 106. Therefore, in general, the pre-processing costs become negligible.
As the
Distribution of parallel processes
The parallelisation of the solver is based on a geometric domain decomposition into P subdomains, one for each parallel process. Standard MPI is used to implement the algorithm. The partition of is carried out by dividing and into P2d and Pper parts respectively, being P = P2dPper. This is referred as a P2d × Pper-partition. To exemplify it, a 2 × 2-and a 4 × 1-partitions of a mesh are displayed in Fig. 7.
The parallelisation of Algorithm 1 can be divided into two parts: (i) the
Numerical experiments
All the numerical tests presented in this paper have been carried out on the MareNostrum supercomputer of the Barcelona supercomputing center (BSC). This is an IBM BladeCenter JS21 Cluster with 10240 PowerPC 970MP processors at 2.3 GHz. Quad-core nodes with 8 Gb are coupled by means of a high-performance Myrinet network.
The code has been compiled in a Linux SuSe distribution using the IBM XL C/C++ enterprise edition compiler version 10.1. Four external libraries are linked: (i) the automatically
Challenging DSD. Flow around a circular cylinder
In this section, the performance of the DSD as frequency solver is analysed by direct comparison with the standard preconditioned conjugate gradient (PCG) method [7], [8] with two different preconditioners. To carry out this analysis, direct numerical simulations of the flow around a circular cylinder at Reynolds numbers 3900 and 10,000 (based on the cylinder diameter, D, and the free-stream velocity), are used as problem models.
Concluding remarks
A parallel direct algorithm for the solution of the Poisson equation arising in incompressible flows with one periodic direction has been presented. It is a combination of a direct Schur-complement based decomposition (DSD) and a Fourier diagonalisation. The latter decomposes the original system into a set of mutually independent 2D systems which are solved by means of the DSD algorithm. Since no restrictions are imposed in the non-periodic directions, the overall algorithm is well suited for
Acknowledgements
This work has been financially supported by Termo Fluids S.L., and the Ministerio de Ciencia e Innovación (Spain); Contract/Grant No. ENE2010-17801 (Project: “Development of high performance parallel codes for the optimal design of thermal equipments”), and a Juan de la Cierva postdoctoral contract (JCI-2009-04910).
Calculations have been performed on the MareNostrum supercomputer at the Barcelona supercomputing center. The authors thankfully acknowledge this institution.
References (35)
- et al.
Direct numerical simulation of turbulent heat transfer in plane impinging jet
International Journal of Heat and Fluid Flow
(2004) - et al.
DNS of buoyancy-dominated turbulent flows on a bluff body using the immersed boundary method
Journal of Computational Physics
(2009) - et al.
A scalable parallel Poisson solver for three-dimensional problems with one periodic direction
Computers and Fluids
(2010) - et al.
Parallel Schur complement method for large-scale systems on distributed memory computers
Applied Mathematical Modelling
(2001) - et al.
Parallel two level block ILU preconditioning techniques for solving large sparse linear systems
Parallel Computing
(2002) - et al.
A parallel time-space least-squares spectral element solver for incompressible flow problems
Applied Mathematics and Computation
(2007) - et al.
Parameter-free symmetry-preserving regularization modeling of a turbulent differentially heated cavity
Computers and Fluids
(2010) - et al.
Kinetic energy conservation issues associated with the collocated mesh scheme for incompressible flow
Journal of Computational Physics
(2006) - et al.
Symmetry-preserving discretization of turbulent flow
Journal of Computational Physics
(2003) - et al.
Fully conservative higher order finite difference schemes for incompressible flow
Journal of Computational Physics
(1998)
DNS of flow past a stationary and oscillating cylinder at Re = 10,000
Journal of Fluids and Structures
Toeplitz and circulant matrices: A review
Foundations and Trends in Communications and Information Theory
Circulant Matrices
Direct numerical simulations of two- and three-dimensional turbulent natural convection flows in a differentially heated cavity of aspect ratio 4
Journal of Fluid Mechanics
Direct numerical simulation of turbulence in a square annular duct
Journal of Fluid Mechanics
Iterative Methods for Sparse Linear Systems
Cited by (41)
A FFT-accelerated multi-block finite-difference solver for massively parallel simulations of incompressible flows
2022, Computer Physics CommunicationsCitation Excerpt :Indeed, this method has allowed for breakthroughs in e.g. DNS of single-phase canonical turbulent flows [21,4], in complex geometries by using immersed boundary methods [10], and in multi-phase flows [22–24], with at least two open-source DNS codes, AFiD [25] and CaNS [20], leveraging this approach. Despite most works in the literature only exploiting the method of eigenfunctions expansions along periodic directions [26], these Fourier-based expansions may be actually employed for many different combinations of boundary conditions [18]. To our best knowledge, finite-difference numerical algorithms reported in the literature using FFT-based finite-difference solvers are restricted to very simple geometries such as a rectangular box [27,20] or cylindrical/spherical domains [25,28], which may be extended to handle more complex geometries using immersed boundary methods [29].
Three dimensionality in the wake of the flow around a circular cylinder at Reynolds number 5000
2017, Computers and FluidsCitation Excerpt :A more in depth presentation of the numerical schemes used can be found in Aljure et al. [1], Jofre et al. [11], Trias et al. [41]. As the meshes are constructed using a constant-step extrusion in the span wise direction of a 2D unstructured grid, the Poisson equation can be solved by means of a direct Schur-Fourier decomposition solver [3]. All simulations are run with the TermoFluids parallel unstructured code (www.termofluids.com) and are carried out on the in-house JFF cluster and on the Mare-nostrum III supercomputer, details on the computing characteristics are found in Aljure et al. [1].
Influence of rotation on the flow over a cylinder at Re=5000
2015, International Journal of Heat and Fluid FlowCitation Excerpt :This method decouples the initial system of equations into a set of two dimensional sub-systems, which are then solved by means of a direct Schur-complement decomposition solver (Borrell et al., 2011).
On the flow past a circular cylinder from critical to super-critical Reynolds numbers: Wake topology and vortex shedding
2015, International Journal of Heat and Fluid FlowAccelerating Poisson solvers in front tracking method using parallel direct methods
2015, Computers and Fluids