GPU accelerated computational homogenization based on a variational approach in a reduced basis framework
Introduction
Modern engineering mechanics simulations are dedicated to the prediction of the mechanical behavior of advanced materials. New types of lightweight construction materials, e.g. fiber reinforced composites in aircraft industry, seem to be homogeneous from a macroscopic point of view although they display strong heterogeneities on smaller scales. These inhomogeneities on the microscale influence the effective macroscopic mechanical behavior. The determination of the effective material properties, based on evaluations on the microscale, is called homogenization. The underlying analytical theory is well-documented by a huge body of literature (see, e.g., the compendium of Nemat-Nasser and Hori [1]). However, analytical methods have their limitations for increasing geometric complexity of the microstructure and when it comes to accurate predictions of nonlinear material behavior.
The fast-growing information technology sector and the attendant increasing computing capacities gave rise to the development of appropriate tools in the field of computer assisted mechanical simulations. Especially the well-known finite element method (FEM, e.g., Bathe, [2]) has been proven to be a powerful tool in computer aided engineering. The use of a nested FEM approach for homogenization problems was introduced by Feyel [3] in terms of the nested finite element method . Thereby, a discretized microstructure is assigned to every integration point on the macroscale and solved numerically using an individual FEM simulation. However, this method results in massive computational effort and tremendous memory requirements, even for two-dimensional problems.
The unacceptable computational cost of solutions gave rise to the development of reduced models, namely the Transformation Field Analysis (TFA, Dvorak and Benveniste [4]) and its successor, the Nonuniform Transformation Field Analysis (NTFA, [5], [6], [7], [8]). The key idea of the NTFA is the approximation of the inelastic strain field using a finite-dimensional, spatially heterogeneous basis, denoted by inelastic modes. The number of degrees of freedom of the reduced problem refers to the number of mode activity coefficients which represent effective (macroscopic) internal variables. Usually this number is several magnitudes smaller than it is in the case of a full-field FEM simulation which reveals the computational efficiency of the method. In its original form [5], [6], the NTFA determined the evolution of the mode activity based on a semi-phenomenological law. Besides giving good accuracy in comparison with full-field simulations, the simplicity of the evolution law is the bottleneck of the NTFA, limiting its field of application to rather simple material models. The NTFA differs from other well-established model order reduction methods such as the hyperreduction [9], [10] or the proper generalized decomposition (PGD, e.g., [11]) due to the consideration of the micromechanics in the reduced basis formulation.
In order to overcome the limitations of former implementations of the NTFA, Fritzen and Böhlke [12] proposed a technique for the evolution of the mode activity coefficients of viscoelastic composites, based on the exact homogenization of the macroscopic free energy and the dissipation. Thereby, the prior drawbacks due to the semi-phenomenological approach could be eliminated. The extension recently presented by Fritzen and Leuschner [13] can eliminate the restriction to rather simple material models on the microscale, which was a major drawback of the NTFA. In the new method the constitutive models on the microscale may be chosen from the class of Generalized Standard Materials (GSM, [14], [15], [16]) in a unified framework. The resulting reduced order homogenization method is suitable for many real world applications. It does no longer incorporate any phenomenological assumptions regarding the evolution of the mode activity coefficients. The latter now stems from a mixed incremental multiscale formulation derived from the incremental variational approach of Ortiz and Stainier [17] and incorporating ideas of, e.g., [18]. The mixed incremental variational approach was found to give accurate results by the referred authors. However, the exceptional computational efficiency of the NTFA that allowed for savings up to reduces to computational savings in the range of . The increase in the computational effort is due to large numbers of rather simple function evaluations that have to be conducted on the microscopic level repeatedly. More precisely, the free energy and the dual dissipation potential and their gradients have to be evaluated at each microscopic position in each iteration of the time integration procedure. In addition to this, small sized linear algebraic operations such as matrix multiplications are required after each of these function calls which results in a considerable overall computational cost even though the algorithmic complexity of the individual operations is negligible.
In the present paper an acceleration of the multiscale method of Fritzen and Leuschner [13] exploiting massive parallelization of the independent function evaluations on graphics processing units (GPUs) is presented. Over the past decade so-called general-purpose computing on graphics processing units (GPGPU, [19]) gained massive influence in science, e.g. in the field of dense linear algebra [20], sorting algorithms [21] and engineering affine applications such as atomistic simulations [22] or fluid dynamics [23]. In comparison with a central processing unit (CPU), a GPU is dedicated to the concurrent, i.e. parallel, executions of the same instruction, incorporating different sets of data. The benefits of GPUs are their high computational capacities and their high memory bandwidth, which are both related to the level of parallelism. GPGPU requires new kinds of programming models, languages and paradigms. We decided to use the CUDA framework of Nvidia which supports different programming languages, in particular C/C++ [24]. The framework offers basic functionality by setting up accessibilities to the GPU via self-written specific routines (the kernels) that are executed on the GPU. This yields a heterogeneous structure of source code which is written for both, the CPU and the GPU.
In this work a high performance numerical library for Nvidia GPUs is developed in order to accelerate the homogenization procedure exploiting the capabilities offered by modern GPUs. In Section 2 the governing equations for the mechanical two-scale problem are derived. An extension of the method for the consideration of spatially heterogeneous hardening variables is introduced. Thereby, the accuracy in the presence of highly heterogeneous hardening states can be improved. The underlying Newton–Raphson scheme used for the iterative computation of the evolution of the mode activity coefficients is analyzed in detail with respect to its algorithmic properties. The parts of the algorithm that are lucrative for GPU acceleration are identified in Section 3. These are the aforementioned batched operations, i.e. matrix–vector and matrix–matrix–matrix (triple matrix) multiplications as well as the computation of the energetic quantities, namely the gradients of the local potentials. These operations have to be evaluated at each integration point of the microscale problem, which can lead to hundreds of thousand function calls per iteration, depending on the resolution of the discretized representative volume element (RVE) on the microscale. For an efficient implementation of these operations, the new STMV (Single Thread Multiple Values) principle for batched processing of small independent problems is presented in Section 3. The particular hardware structure of the GPU and the properties of the CUDA framework are considered in order to attain efficient algorithms. Performance capabilities of the GPU accelerated implementations of the batched linear algebraic operations are demonstrated in Section 4 by comparison to CPU solutions. Finally, Section 4.8 gives performance considerations of the GPU accelerated implementation of the complete reduced basis homogenization algorithm which incorporates all of the aforementioned GPU kernels. The GPU implementation is used to predict the nonlinear overall behavior of a realistic high resolution microstructural problem representing a fiber reinforced material with visco-plastic matrix material. The computational gains provided by the GPU supported implementation of the linear algebraic subroutines are found to be in the order of 10 over state of the art CPU implementations (one GPU versus a 24 core workstation). Consequently, the RB-MOR scheme is accelerated by a factor of and higher, while preserving a high accuracy of the anisotropic material response in the reduced framework.
An index free notation is used with lowercase symbols denoting scalars, e.g. , lowercase bold face letters denoting first-order tensors (or vectors), e.g. , and uppercase Latin and lowercase Greek bold face letters representing second-order tensors, e.g. . Fourth-order tensors are typeset as and alike. The two isotropic projectors are defined using the identity tensor and the identity on symmetric tensors by A six-dimensional basis of orthonormal symmetric second-order tensors () and a five-dimensional basis of orthonormal deviatoric tensors () are assumed. Besides tensorial quantities intrinsically living in a spatial setting, a series of real valued tuples is used. They are referred to as vectors and are typeset as lowercase letters with a accent, e.g. . In analogy, matrices are denoted by uppercase letters . Then a symmetric second-order tensor has an equivalent vector representation Similarly, a fourth-order tensor with both minor symmetries has a 66 matrix representation . Arguments such as the position or time are omitted for brevity if not explicitly required in the given context. Further, we define the averaging operator , where may represent any field variable.
Section snippets
Two-scale problem
In the following attention is confined to two-scale isothermal mechanical problems. In this case the material length scale and the structural length scale are clearly separated (see Fig. 1), which is meant in the sense that the fluctuations of macroscopic fields are assumed negligible for spatial variations in the order of the size of the representative volume element (RVE). Macroscopic variables are over-lined such as the displacement field , the strain field and
Embedded GPU accelerated parts
In the following, a finite element discretization of the RVE is assumed. The number of integration points of the discretized FE problem is denoted by . According to Section 2.4, the increments of the effective internal variables and are determined via the root finding problem (44). The structure of the evolution law requires function evaluations at every integration point on the microscale. These are necessary for the assembly of the right hand side vector in (44) and the
General benchmarking methodology
When it comes to the benchmarking of parallel algorithms on different platforms, it is hard to obtain objective data. Depending on the representation of the results, the parallel or the sequential implementation can easily be favored (see, e.g., [37]). In order to provide as objective information as possible, we have developed a specific benchmarking methodology: In the following Sections 4.2–4.7 an attempt to assess the performance of our GPU implementation is made. More specifically, we first
Summary and conclusions
A new heterogeneous programming model in the context of order-reduction based homogenization methods was investigated. The theoretical framework of the homogenization scheme refers to the reduced basis model order reduction (RB-MOR) which was recently proposed by the authors [13]. The approach extends the Nonuniform Transformation Field Analysis (NTFA) to a more general form, such that solids of the class of Generalized Standard Materials (GSM) can be considered. In the present work two
Acknowledgments
The authors highly acknowledge the detailed and valuable comments of the unknown reviewers.
Funding of this work in the context of the YIG Computer Aided Material Modeling from the Karlsruhe Institute of Technology (KIT) in the context of the Excellence Initiative of the German Research Foundation (DFG) and via the DFG grant FR-2702/3 is highly acknowledged.
References (37)
Multiscale FE elastoviscoplastic analysis of composite structures
Computat. Mater. Sci.
(1999)- et al.
Nonuniform transformation field analysis
Int. J. Solids Struct.
(2003) - et al.
Computational analysis of nonlinear composite structures using the nonuniform transformation field analysis
Comput. Methods Appl. Mech. Engrg.
(2004) - et al.
Multi-level a priori hyper reduction of mechanical models involving internal variables
Comput. Methods Appl. Mech. Engrg.
(2010) - et al.
Reduced basis homogenization of viscoelastic composites
Composites Sci. Technol.
(2013) - et al.
Reduced basis hybrid computational homogenization based on a mixed incremental formulation
Comput. Methods Appl. Mech. Engrg.
(2013) - et al.
Algorithms for the solution of internal variable problems in plasticity
Comput. Methods Appl. Mech. Engrg.
(1991) A multi-field incremental variational framework for gradient-extended standard dissipative solids
J. Mech. Phys. Solids
(2011)- et al.
General purpose molecular dynamics simulations fully implemented on graphics processing units
J. Comput. Phys.
(2008) - et al.
Large calculation of the flow over a hypersonic vehicle using a gpu
J. Comput. Phys.
(2008)
Computational homogenization of elasto-plastic porous metals
Int. J. Plast.
Nonuniform transformation field analysis of elastic-viscoplastic composites
Composites Sci. Technol.
Nonuniform transformation field analysis of materials with morphological anisotropy
Composites Sci. Technol.
A nonuniform {TFA} homogenization technique based on piecewise interpolation functions of the inelastic field
Int. J. Solids Struct.
Micromechanics: overall properties of heterogeneous materials
Finite Element Procedures
On transformation strains and uniform fields in multiphase elastic media
Proc. R. Soc. London A
Three-dimensional finite element implementation of the nonuniform transformation field analysis
Int. J. for Numer. Methods Eng.
Cited by (66)
Achieving high efficiency in reduced order modeling for large scale polycrystal plasticity simulations
2024, Finite Elements in Analysis and DesignSurrogate modeling for the homogenization of elastoplastic composites based on RBF interpolation
2023, Computer Methods in Applied Mechanics and EngineeringNTFA-enabled goal-oriented adaptive space–time finite elements for micro-heterogeneous elastoplasticity problems
2022, Computer Methods in Applied Mechanics and Engineering