GPU accelerated computational homogenization based on a variational approach in a reduced basis framework

https://doi.org/10.1016/j.cma.2014.05.006Get rights and content

Abstract

Computational multiscale methods such as the FE2 technique (Feyel, 1999) come along with large demands in both CPU time and memory. In order to significantly reduce the computational cost of multiscale methods the authors recently proposed a hybrid computational homogenization method for visco-plastic materials using a reduced basis approach in a mixed variational formulation (Fritzen and Leuschner, 2013). In the present contribution two extensions of the method are introduced: First, the previous proposal is extended by allowing for heterogeneous hardening variables instead of piecewise constant fields. This leads to an improved accuracy of the method. Second, a massively parallel GPU implementation of the algorithm using Nvidia’s CUDA framework is presented. The GPU subroutines for the batched linear algebraic operations are integrated into a specialized library in order to facilitate its use. The impact of the heterogeneous hardening states on the accuracy and the performance gains obtained from the dedicated GPU implementation are illustrated by means of numerical examples. An overall speedup in the order of 104 with respect to a high performance finite element implementation is achieved while preserving good accuracy of the predicted nonlinear material response.

Introduction

Modern engineering mechanics simulations are dedicated to the prediction of the mechanical behavior of advanced materials. New types of lightweight construction materials, e.g. fiber reinforced composites in aircraft industry, seem to be homogeneous from a macroscopic point of view although they display strong heterogeneities on smaller scales. These inhomogeneities on the microscale influence the effective macroscopic mechanical behavior. The determination of the effective material properties, based on evaluations on the microscale, is called homogenization. The underlying analytical theory is well-documented by a huge body of literature (see, e.g., the compendium of Nemat-Nasser and Hori  [1]). However, analytical methods have their limitations for increasing geometric complexity of the microstructure and when it comes to accurate predictions of nonlinear material behavior.

The fast-growing information technology sector and the attendant increasing computing capacities gave rise to the development of appropriate tools in the field of computer assisted mechanical simulations. Especially the well-known finite element method (FEM, e.g., Bathe, [2]) has been proven to be a powerful tool in computer aided engineering. The use of a nested FEM approach for homogenization problems was introduced by Feyel  [3] in terms of the nested finite element method FEp. Thereby, a discretized microstructure is assigned to every integration point on the macroscale and solved numerically using an individual FEM simulation. However, this method results in massive computational effort and tremendous memory requirements, even for two-dimensional problems.

The unacceptable computational cost of FEp solutions gave rise to the development of reduced models, namely the Transformation Field Analysis (TFA, Dvorak and Benveniste [4]) and its successor, the Nonuniform Transformation Field Analysis (NTFA, [5], [6], [7], [8]). The key idea of the NTFA is the approximation of the inelastic strain field using a finite-dimensional, spatially heterogeneous basis, denoted by inelastic modes. The number of degrees of freedom of the reduced problem refers to the number of mode activity coefficients which represent effective (macroscopic) internal variables. Usually this number is several magnitudes smaller than it is in the case of a full-field FEM simulation which reveals the computational efficiency of the method. In its original form  [5], [6], the NTFA determined the evolution of the mode activity based on a semi-phenomenological law. Besides giving good accuracy in comparison with full-field simulations, the simplicity of the evolution law is the bottleneck of the NTFA, limiting its field of application to rather simple material models. The NTFA differs from other well-established model order reduction methods such as the hyperreduction  [9], [10] or the proper generalized decomposition (PGD, e.g., [11]) due to the consideration of the micromechanics in the reduced basis formulation.

In order to overcome the limitations of former implementations of the NTFA, Fritzen and Böhlke  [12] proposed a technique for the evolution of the mode activity coefficients of viscoelastic composites, based on the exact homogenization of the macroscopic free energy and the dissipation. Thereby, the prior drawbacks due to the semi-phenomenological approach could be eliminated. The extension recently presented by Fritzen and Leuschner  [13] can eliminate the restriction to rather simple material models on the microscale, which was a major drawback of the NTFA. In the new method the constitutive models on the microscale may be chosen from the class of Generalized Standard Materials (GSM,  [14], [15], [16]) in a unified framework. The resulting reduced order homogenization method is suitable for many real world applications. It does no longer incorporate any phenomenological assumptions regarding the evolution of the mode activity coefficients. The latter now stems from a mixed incremental multiscale formulation derived from the incremental variational approach of Ortiz and Stainier  [17] and incorporating ideas of, e.g.,  [18]. The mixed incremental variational approach was found to give accurate results by the referred authors. However, the exceptional computational efficiency of the NTFA that allowed for savings up to 1010 reduces to computational savings in the range of 102. The increase in the computational effort is due to large numbers of rather simple function evaluations that have to be conducted on the microscopic level repeatedly. More precisely, the free energy and the dual dissipation potential and their gradients have to be evaluated at each microscopic position in each iteration of the time integration procedure. In addition to this, small sized linear algebraic operations such as matrix multiplications are required after each of these function calls which results in a considerable overall computational cost even though the algorithmic complexity of the individual operations is negligible.

In the present paper an acceleration of the multiscale method of Fritzen and Leuschner  [13] exploiting massive parallelization of the independent function evaluations on graphics processing units (GPUs) is presented. Over the past decade so-called general-purpose computing on graphics processing units (GPGPU,  [19]) gained massive influence in science, e.g. in the field of dense linear algebra  [20], sorting algorithms  [21] and engineering affine applications such as atomistic simulations  [22] or fluid dynamics  [23]. In comparison with a central processing unit (CPU), a GPU is dedicated to the concurrent, i.e. parallel, executions of the same instruction, incorporating different sets of data. The benefits of GPUs are their high computational capacities and their high memory bandwidth, which are both related to the level of parallelism. GPGPU requires new kinds of programming models, languages and paradigms. We decided to use the CUDA framework of Nvidia which supports different programming languages, in particular C/C++  [24]. The framework offers basic functionality by setting up accessibilities to the GPU via self-written specific routines (the kernels) that are executed on the GPU. This yields a heterogeneous structure of source code which is written for both, the CPU and the GPU.

In this work a high performance numerical library for Nvidia GPUs is developed in order to accelerate the homogenization procedure exploiting the capabilities offered by modern GPUs. In Section  2 the governing equations for the mechanical two-scale problem are derived. An extension of the method for the consideration of spatially heterogeneous hardening variables is introduced. Thereby, the accuracy in the presence of highly heterogeneous hardening states can be improved. The underlying Newton–Raphson scheme used for the iterative computation of the evolution of the mode activity coefficients is analyzed in detail with respect to its algorithmic properties. The parts of the algorithm that are lucrative for GPU acceleration are identified in Section  3. These are the aforementioned batched operations, i.e. matrix–vector and matrix–matrix–matrix (triple matrix) multiplications as well as the computation of the energetic quantities, namely the gradients of the local potentials. These operations have to be evaluated at each integration point of the microscale problem, which can lead to hundreds of thousand function calls per iteration, depending on the resolution of the discretized representative volume element (RVE) on the microscale. For an efficient implementation of these operations, the new STMV (Single Thread Multiple Values) principle for batched processing of small independent problems is presented in Section  3. The particular hardware structure of the GPU and the properties of the CUDA framework are considered in order to attain efficient algorithms. Performance capabilities of the GPU accelerated implementations of the batched linear algebraic operations are demonstrated in Section  4 by comparison to CPU solutions. Finally, Section  4.8 gives performance considerations of the GPU accelerated implementation of the complete reduced basis homogenization algorithm which incorporates all of the aforementioned GPU kernels. The GPU implementation is used to predict the nonlinear overall behavior of a realistic high resolution microstructural problem representing a fiber reinforced material with visco-plastic matrix material. The computational gains provided by the GPU supported implementation of the linear algebraic subroutines are found to be in the order of 10 over state of the art CPU implementations (one GPU versus a 24 core workstation). Consequently, the RB-MOR scheme is accelerated by a factor of  104 and higher, while preserving a high accuracy of the anisotropic material response in the reduced framework.

An index free notation is used with lowercase symbols denoting scalars, e.g.  c,ψ,ϕ, lowercase bold face letters denoting first-order tensors (or vectors), e.g.  x,u, and uppercase Latin and lowercase Greek bold face letters representing second-order tensors, e.g.  A,σ,ε. Fourth-order tensors are typeset as  C,A,P and alike. The two isotropic projectors are defined using the identity tensor I and the identity on symmetric tensors Is by P1=13II,P2=IsP1. A six-dimensional basis of orthonormal symmetric second-order tensors  B(α) (α=1,,6) and a five-dimensional basis of orthonormal deviatoric tensors  B(α) (α=1,,5) are assumed. Besides tensorial quantities intrinsically living in a spatial setting, a series of real valued tuples is used. They are referred to as vectors and are typeset as lowercase letters with a  ˆ accent, e.g.  ξˆ,τˆ. In analogy, matrices are denoted by uppercase letters  Aˆ. Then a symmetric second-order tensor  σ has an equivalent vector representation σ=α=16σαB(α)σˆ=(σ1,,σ6)Tσ. Similarly, a fourth-order tensor C with both minor symmetries has a 6×6 matrix representation Cˆ. Arguments such as the position  x or time  t are omitted for brevity if not explicitly required in the given context. Further, we define the averaging operator =1|Ω|ΩdV, where may represent any field variable.

Section snippets

Two-scale problem

In the following attention is confined to two-scale isothermal mechanical problems. In this case the material length scale  l and the structural length scale  L are clearly separated (see Fig. 1), which is meant in the sense that the fluctuations of macroscopic fields are assumed negligible for spatial variations in the order of the size of the representative volume element (RVE). Macroscopic variables are over-lined such as the displacement field  ū, the strain field  ε̄=sym(gradx̄(ū)) and

Embedded GPU accelerated parts

In the following, a finite element discretization of the RVE Ω is assumed. The number of integration points of the discretized FE problem is denoted by  ngp. According to Section  2.4, the increments of the effective internal variables ξˆ and λˆ are determined via the root finding problem (44). The structure of the evolution law requires function evaluations at every integration point  xi on the microscale. These are necessary for the assembly of the right hand side vector fˆ in (44) and the

General benchmarking methodology

When it comes to the benchmarking of parallel algorithms on different platforms, it is hard to obtain objective data. Depending on the representation of the results, the parallel or the sequential implementation can easily be favored (see, e.g., [37]). In order to provide as objective information as possible, we have developed a specific benchmarking methodology: In the following Sections 4.2–4.7 an attempt to assess the performance of our GPU implementation is made. More specifically, we first

Summary and conclusions

A new heterogeneous programming model in the context of order-reduction based homogenization methods was investigated. The theoretical framework of the homogenization scheme refers to the reduced basis model order reduction (RB-MOR) which was recently proposed by the authors  [13]. The approach extends the Nonuniform Transformation Field Analysis (NTFA) to a more general form, such that solids of the class of Generalized Standard Materials (GSM) can be considered. In the present work two

Acknowledgments

The authors highly acknowledge the detailed and valuable comments of the unknown reviewers.

Funding of this work in the context of the YIG Computer Aided Material Modeling from the Karlsruhe Institute of Technology (KIT) in the context of the Excellence Initiative of the German Research Foundation (DFG) and via the DFG grant FR-2702/3 is highly acknowledged.

References (37)

  • F. Fritzen et al.

    Computational homogenization of elasto-plastic porous metals

    Int. J. Plast.

    (2012)
  • S. Roussette et al.

    Nonuniform transformation field analysis of elastic-viscoplastic composites

    Composites Sci. Technol.

    (2009)
  • F. Fritzen et al.

    Nonuniform transformation field analysis of materials with morphological anisotropy

    Composites Sci. Technol.

    (2011)
  • V. Sepe et al.

    A nonuniform {TFA} homogenization technique based on piecewise interpolation functions of the inelastic field

    Int. J. Solids Struct.

    (2013)
  • S. Nemat-Nasser et al.

    Micromechanics: overall properties of heterogeneous materials

  • K. Bathe

    Finite Element Procedures

    (1996)
  • G. Dvorak et al.

    On transformation strains and uniform fields in multiphase elastic media

    Proc. R. Soc. London A

    (1992)
  • F. Fritzen et al.

    Three-dimensional finite element implementation of the nonuniform transformation field analysis

    Int. J. for Numer. Methods Eng.

    (2010)
  • Cited by (66)

    View all citing articles on Scopus
    View full text