GPU accelerated computational homogenization based on a variational approach in a reduced basis framework

doi:10.1016/j.cma.2014.05.006

Computer Methods in Applied Mechanics and Engineering

Volume 278, 15 August 2014, Pages 186-217

https://doi.org/10.1016/j.cma.2014.05.006 Get rights and content

Abstract

Computational multiscale methods such as the FE² technique (Feyel, 1999) come along with large demands in both CPU time and memory. In order to significantly reduce the computational cost of multiscale methods the authors recently proposed a hybrid computational homogenization method for visco-plastic materials using a reduced basis approach in a mixed variational formulation (Fritzen and Leuschner, 2013). In the present contribution two extensions of the method are introduced: First, the previous proposal is extended by allowing for heterogeneous hardening variables instead of piecewise constant fields. This leads to an improved accuracy of the method. Second, a massively parallel GPU implementation of the algorithm using Nvidia’s CUDA framework is presented. The GPU subroutines for the batched linear algebraic operations are integrated into a specialized library in order to facilitate its use. The impact of the heterogeneous hardening states on the accuracy and the performance gains obtained from the dedicated GPU implementation are illustrated by means of numerical examples. An overall speedup in the order of 10⁴ with respect to a high performance finite element implementation is achieved while preserving good accuracy of the predicted nonlinear material response.

Introduction

Modern engineering mechanics simulations are dedicated to the prediction of the mechanical behavior of advanced materials. New types of lightweight construction materials, e.g. fiber reinforced composites in aircraft industry, seem to be homogeneous from a macroscopic point of view although they display strong heterogeneities on smaller scales. These inhomogeneities on the microscale influence the effective macroscopic mechanical behavior. The determination of the effective material properties, based on evaluations on the microscale, is called homogenization. The underlying analytical theory is well-documented by a huge body of literature (see, e.g., the compendium of Nemat-Nasser and Hori [1]). However, analytical methods have their limitations for increasing geometric complexity of the microstructure and when it comes to accurate predictions of nonlinear material behavior.

The fast-growing information technology sector and the attendant increasing computing capacities gave rise to the development of appropriate tools in the field of computer assisted mechanical simulations. Especially the well-known finite element method (FEM, e.g., Bathe, [2]) has been proven to be a powerful tool in computer aided engineering. The use of a nested FEM approach for homogenization problems was introduced by Feyel [3] in terms of the nested finite element method ${FE}^{p}$ . Thereby, a discretized microstructure is assigned to every integration point on the macroscale and solved numerically using an individual FEM simulation. However, this method results in massive computational effort and tremendous memory requirements, even for two-dimensional problems.

The unacceptable computational cost of ${FE}^{p}$ solutions gave rise to the development of reduced models, namely the Transformation Field Analysis (TFA, Dvorak and Benveniste [4]) and its successor, the Nonuniform Transformation Field Analysis (NTFA, [5], [6], [7], [8]). The key idea of the NTFA is the approximation of the inelastic strain field using a finite-dimensional, spatially heterogeneous basis, denoted by inelastic modes. The number of degrees of freedom of the reduced problem refers to the number of mode activity coefficients which represent effective (macroscopic) internal variables. Usually this number is several magnitudes smaller than it is in the case of a full-field FEM simulation which reveals the computational efficiency of the method. In its original form [5], [6], the NTFA determined the evolution of the mode activity based on a semi-phenomenological law. Besides giving good accuracy in comparison with full-field simulations, the simplicity of the evolution law is the bottleneck of the NTFA, limiting its field of application to rather simple material models. The NTFA differs from other well-established model order reduction methods such as the hyperreduction [9], [10] or the proper generalized decomposition (PGD, e.g., [11]) due to the consideration of the micromechanics in the reduced basis formulation.

In order to overcome the limitations of former implementations of the NTFA, Fritzen and Böhlke [12] proposed a technique for the evolution of the mode activity coefficients of viscoelastic composites, based on the exact homogenization of the macroscopic free energy and the dissipation. Thereby, the prior drawbacks due to the semi-phenomenological approach could be eliminated. The extension recently presented by Fritzen and Leuschner [13] can eliminate the restriction to rather simple material models on the microscale, which was a major drawback of the NTFA. In the new method the constitutive models on the microscale may be chosen from the class of Generalized Standard Materials (GSM, [14], [15], [16]) in a unified framework. The resulting reduced order homogenization method is suitable for many real world applications. It does no longer incorporate any phenomenological assumptions regarding the evolution of the mode activity coefficients. The latter now stems from a mixed incremental multiscale formulation derived from the incremental variational approach of Ortiz and Stainier [17] and incorporating ideas of, e.g., [18]. The mixed incremental variational approach was found to give accurate results by the referred authors. However, the exceptional computational efficiency of the NTFA that allowed for savings up to $10^{10}$ reduces to computational savings in the range of $10^{2}$ . The increase in the computational effort is due to large numbers of rather simple function evaluations that have to be conducted on the microscopic level repeatedly. More precisely, the free energy and the dual dissipation potential and their gradients have to be evaluated at each microscopic position in each iteration of the time integration procedure. In addition to this, small sized linear algebraic operations such as matrix multiplications are required after each of these function calls which results in a considerable overall computational cost even though the algorithmic complexity of the individual operations is negligible.

In the present paper an acceleration of the multiscale method of Fritzen and Leuschner [13] exploiting massive parallelization of the independent function evaluations on graphics processing units (GPUs) is presented. Over the past decade so-called general-purpose computing on graphics processing units (GPGPU, [19]) gained massive influence in science, e.g. in the field of dense linear algebra [20], sorting algorithms [21] and engineering affine applications such as atomistic simulations [22] or fluid dynamics [23]. In comparison with a central processing unit (CPU), a GPU is dedicated to the concurrent, i.e. parallel, executions of the same instruction, incorporating different sets of data. The benefits of GPUs are their high computational capacities and their high memory bandwidth, which are both related to the level of parallelism. GPGPU requires new kinds of programming models, languages and paradigms. We decided to use the CUDA framework of Nvidia which supports different programming languages, in particular C/C++ [24]. The framework offers basic functionality by setting up accessibilities to the GPU via self-written specific routines (the kernels) that are executed on the GPU. This yields a heterogeneous structure of source code which is written for both, the CPU and the GPU.

In this work a high performance numerical library for Nvidia GPUs is developed in order to accelerate the homogenization procedure exploiting the capabilities offered by modern GPUs. In Section 2 the governing equations for the mechanical two-scale problem are derived. An extension of the method for the consideration of spatially heterogeneous hardening variables is introduced. Thereby, the accuracy in the presence of highly heterogeneous hardening states can be improved. The underlying Newton–Raphson scheme used for the iterative computation of the evolution of the mode activity coefficients is analyzed in detail with respect to its algorithmic properties. The parts of the algorithm that are lucrative for GPU acceleration are identified in Section 3. These are the aforementioned batched operations, i.e. matrix–vector and matrix–matrix–matrix (triple matrix) multiplications as well as the computation of the energetic quantities, namely the gradients of the local potentials. These operations have to be evaluated at each integration point of the microscale problem, which can lead to hundreds of thousand function calls per iteration, depending on the resolution of the discretized representative volume element (RVE) on the microscale. For an efficient implementation of these operations, the new STMV (Single Thread Multiple Values) principle for batched processing of small independent problems is presented in Section 3. The particular hardware structure of the GPU and the properties of the CUDA framework are considered in order to attain efficient algorithms. Performance capabilities of the GPU accelerated implementations of the batched linear algebraic operations are demonstrated in Section 4 by comparison to CPU solutions. Finally, Section 4.8 gives performance considerations of the GPU accelerated implementation of the complete reduced basis homogenization algorithm which incorporates all of the aforementioned GPU kernels. The GPU implementation is used to predict the nonlinear overall behavior of a realistic high resolution microstructural problem representing a fiber reinforced material with visco-plastic matrix material. The computational gains provided by the GPU supported implementation of the linear algebraic subroutines are found to be in the order of 10 over state of the art CPU implementations (one GPU versus a 24 core workstation). Consequently, the RB-MOR scheme is accelerated by a factor of $10^{4}$ and higher, while preserving a high accuracy of the anisotropic material response in the reduced framework.

An index free notation is used with lowercase symbols denoting scalars, e.g. $c, ψ, ϕ$ , lowercase bold face letters denoting first-order tensors (or vectors), e.g. $x, u$ , and uppercase Latin and lowercase Greek bold face letters representing second-order tensors, e.g. $A, σ, ε$ . Fourth-order tensors are typeset as $C, A, P$ and alike. The two isotropic projectors are defined using the identity tensor $I$ and the identity on symmetric tensors $I^{s}$ by $P_{1} = \frac{1}{3} I \otimes I, P_{2} = I^{s} - P_{1} .$ A six-dimensional basis of orthonormal symmetric second-order tensors $B^{(α)}$ ( $α = 1, \dots, 6$ ) and a five-dimensional basis of orthonormal deviatoric tensors ${B^{'}}^{(α)}$ ( $α = 1, \dots, 5$ ) are assumed. Besides tensorial quantities intrinsically living in a spatial setting, a series of real valued tuples is used. They are referred to as vectors and are typeset as lowercase letters with a $\hat{•}$ accent, e.g. $\hat{ξ}, \hat{τ}$ . In analogy, matrices are denoted by uppercase letters $\hat{A}$ . Then a symmetric second-order tensor $σ$ has an equivalent vector representation $σ = \sum_{α = 1}^{6} σ_{α} B^{(α)} \Leftrightarrow \hat{σ} = {(σ_{1}, \dots, σ_{6})}^{T} \sim σ .$ Similarly, a fourth-order tensor $C$ with both minor symmetries has a 6 $\times$ 6 matrix representation $\hat{C}$ . Arguments such as the position $x$ or time $t$ are omitted for brevity if not explicitly required in the given context. Further, we define the averaging operator $〈 • 〉 = \frac{1}{| Ω |} \int_{Ω} • d V$ , where $•$ may represent any field variable.

Section snippets

Two-scale problem

In the following attention is confined to two-scale isothermal mechanical problems. In this case the material length scale $l$ and the structural length scale $L$ are clearly separated (see Fig. 1), which is meant in the sense that the fluctuations of macroscopic fields are assumed negligible for spatial variations in the order of the size of the representative volume element (RVE). Macroscopic variables are over-lined such as the displacement field $\bar{u}$ , the strain field $\bar{ε} = sym ({grad}_{\bar{x}} (\bar{u}))$ and

Embedded GPU accelerated parts

In the following, a finite element discretization of the RVE $Ω$ is assumed. The number of integration points of the discretized FE problem is denoted by $n_{gp}$ . According to Section 2.4, the increments of the effective internal variables $\hat{ξ}$ and $\hat{λ}$ are determined via the root finding problem (44). The structure of the evolution law requires function evaluations at every integration point $x_{i}$ on the microscale. These are necessary for the assembly of the right hand side vector $\hat{f}$ in (44) and the

General benchmarking methodology

When it comes to the benchmarking of parallel algorithms on different platforms, it is hard to obtain objective data. Depending on the representation of the results, the parallel or the sequential implementation can easily be favored (see, e.g., [37]). In order to provide as objective information as possible, we have developed a specific benchmarking methodology: In the following Sections 4.2–4.7 an attempt to assess the performance of our GPU implementation is made. More specifically, we first

Summary and conclusions

A new heterogeneous programming model in the context of order-reduction based homogenization methods was investigated. The theoretical framework of the homogenization scheme refers to the reduced basis model order reduction (RB-MOR) which was recently proposed by the authors [13]. The approach extends the Nonuniform Transformation Field Analysis (NTFA) to a more general form, such that solids of the class of Generalized Standard Materials (GSM) can be considered. In the present work two

Acknowledgments

The authors highly acknowledge the detailed and valuable comments of the unknown reviewers.

Funding of this work in the context of the YIG Computer Aided Material Modeling from the Karlsruhe Institute of Technology (KIT) in the context of the Excellence Initiative of the German Research Foundation (DFG) and via the DFG grant FR-2702/3 is highly acknowledged.

References (37)

F. Feyel
Multiscale FE $^{2}$ elastoviscoplastic analysis of composite structures
Computat. Mater. Sci.
(1999)
J. Michel et al.
Nonuniform transformation field analysis
Int. J. Solids Struct.
(2003)
J. Michel et al.
Computational analysis of nonlinear composite structures using the nonuniform transformation field analysis
Comput. Methods Appl. Mech. Engrg.
(2004)
D. Ryckelynck et al.
Multi-level a priori hyper reduction of mechanical models involving internal variables
Comput. Methods Appl. Mech. Engrg.
(2010)
F. Fritzen et al.
Reduced basis homogenization of viscoelastic composites
Composites Sci. Technol.
(2013)
F. Fritzen et al.
Reduced basis hybrid computational homogenization based on a mixed incremental formulation
Comput. Methods Appl. Mech. Engrg.
(2013)
B. Reddy et al.
Algorithms for the solution of internal variable problems in plasticity
Comput. Methods Appl. Mech. Engrg.
(1991)
C. Miehe
A multi-field incremental variational framework for gradient-extended standard dissipative solids
J. Mech. Phys. Solids
(2011)
J.A. Anderson et al.
General purpose molecular dynamics simulations fully implemented on graphics processing units
J. Comput. Phys.
(2008)
E. Elsen et al.
Large calculation of the flow over a hypersonic vehicle using a gpu
J. Comput. Phys.
(2008)

F. Fritzen et al.

Computational homogenization of elasto-plastic porous metals

Int. J. Plast.

(2012)

S. Roussette et al.

Nonuniform transformation field analysis of elastic-viscoplastic composites

Composites Sci. Technol.

(2009)

F. Fritzen et al.

Nonuniform transformation field analysis of materials with morphological anisotropy

Composites Sci. Technol.

(2011)

V. Sepe et al.

A nonuniform {TFA} homogenization technique based on piecewise interpolation functions of the inelastic field

Int. J. Solids Struct.

(2013)

S. Nemat-Nasser et al.

Micromechanics: overall properties of heterogeneous materials

K. Bathe

Finite Element Procedures

(1996)

G. Dvorak et al.

On transformation strains and uniform fields in multiphase elastic media

Proc. R. Soc. London A

(1992)

F. Fritzen et al.

Three-dimensional finite element implementation of the nonuniform transformation field analysis

Int. J. for Numer. Methods Eng.

(2010)

Cited by (66)

Achieving high efficiency in reduced order modeling for large scale polycrystal plasticity simulations
2024, Finite Elements in Analysis and Design
Reduced order models for the nonlinear response of heterogeneous microstructures typically require a construction (or training) stage to build the reduced order basis. In this manuscript, an efficient model construction strategy for the eigenstrain homogenization method (EHM) is presented. The proposed strategy relies on a parallel, element-by-element, conjugate gradient solver. Near linear scaling has been achieved with respect to the number of degrees of freedom used to resolve the microstructure. Linear scaling with respect to the number of pre-analyses required to construct the reduced order model (ROM) follows from the EHM formulation. Furthermore, a parallel implementation for fast evaluation of the constructed ROM has been developed using shared memory parallelization. It has been shown that for large microstructures with $\approx 10, 000$ grains, the total computational cost of evaluating the nonlinear response of a polycrystal could be reduced by approximately an order of magnitude using 32 cores with respect to serial ROM simulation. The present methodology has been verified using an additively manufactured polycrystalline microstructure of a nickel-based superalloy, Inconel 625. The capability of the developed framework to construct a ROM for such large microstructures, as well as the ability of the ROM to predict average and local quantities of interest has been demonstrated.
Surrogate modeling for the homogenization of elastoplastic composites based on RBF interpolation
2023, Computer Methods in Applied Mechanics and Engineering
We propose a new framework for creating a surrogate model of computational homogenization for elastoplastic composite materials that serves as a homogenized constitutive law for decoupled two-scale analysis. Two key ingredients of the proposed surrogate modeling are the introduction of a variable representing the macroscopic strain history along with the macroscopic strain and the application of radial basis function (RBF) interpolation. Not only the coefficients of the RBF but also the type of its function form are considered hyperparameters and are determined by applying an optimization algorithm. In the offline process, numerical material tests (NMTs) are performed on a unit cell consisting of multiple elastoplastic materials subjected to various patterns of macroscopic strain to create a database that stores the discrete relationships between variables representing macroscopic deformation states with the macroscopic stresses. Then, RBF interpolation is applied to represent the macroscopic stress in a continuous function with these input data as independent variables. In the online process, this function is used as a macroscopic constitutive law in the macroscopic analysis, which can be followed by localization analysis using an arbitrary macroscopic strain history if necessary.
Evaluation of computational homogenization methods for the prediction of mechanical properties of additively manufactured metal parts
2023, Additive Manufacturing
It is well known that the strongly location-dependent microstructures observed in metal parts made using additive manufacturing (AM) processes differ from those found in components produced via traditional manufacturing processes. This is primarily because the microstructures in AM parts strongly depend on spatially-varying cooling rates that are caused by several factors including part geometry and proximity to the build plate. Consequently, it is necessary to take a fresh look into methods used for calculating resulting mechanical properties, as the increased spatial resolution that must be applied to the description of properties of AM parts calls for vast improvements in computational efficiency. Computational homogenization methods are employed for the calculation of these mechanical properties based on the polycrystalline microstructures typically encountered in metallic materials. Such calculations that enable the description of properties as a function of position increase the accuracy of a vital link in the structure–process–property–performance continuum that falls within the Integrated Computational Materials Engineering (ICME) paradigm. In this work we describe and critically review six of these methods and their application to AM microstructures. We find that the Fast Fourier Transform (FFT) method and the Finite Element Method (FEM) are the optimal choices based on several factors including accuracy, efficiency, and applicability to polycrystalline microstructures. Provided the microstructure is amenable to the use of uniform grids and the assumption of periodic boundary conditions, the FFT method has greater computational efficiency. Where unstructured meshes, adaptive remeshing and/or non-periodic boundary conditions are desired, the FEM is the method of choice. Our recommendations are equally applicable to functionally graded AM parts where site-specific microstructures are engineered using multiple materials or different process parameters. They can also be utilized for other material systems and circumstances where increased accuracy is achieved by describing mechanical properties as a function of location.
NTFA-enabled goal-oriented adaptive space–time finite elements for micro-heterogeneous elastoplasticity problems
2022, Computer Methods in Applied Mechanics and Engineering
In this work, we establish a goal-oriented space–time finite element method for a class of dissipative heterogeneous materials. Those materials are modeled on both micro- and macroscale, with a scale transition of volume averaging type satisfying the Hill–Mandel condition. A nonuniform transformation field analysis is performed on the microscopic inelastic strain fields for a model reduction. Reduced variables are deduced from a space–time decomposition of those inelastic strain fields. Closed-form constitutive relations are derived from some dissipative considerations, thus resulting into a reduced order homogenization problem. The resulting model error is sufficiently small for the considered class of materials, thus leaving the discretization error of the finite element method as a main error source. For ease of error estimate, we rewrite the reduced order problem in a multifield formulation. Based on duality techniques, a backward-in-time dual problem is derived from a Lagrange method, rendering error representations of a user-defined quantity of interest. Combining a patch recovery technique, a computable error estimator is developed to quantify both spatial and temporal discretization errors. By means of a localization technique, local error estimators are used to drive a greedy adaptive refinement algorithm in space and time. The effectiveness of the resulting algorithm is confirmed by several numerical examples w.r.t. a prototype model.
Multiscale analysis of composite structures with goal-oriented mesh adaptivity and reduced order homogenization
2022, Composite Structures
We present a multiscale finite element approach for composite structure analysis, applying reduced order homogenization to obtain an accurate surrogate model for microscale RVE computations and establishing macroscale mesh adaptivity towards an error control of a user-defined quantity of interest. By means of the so-called nonuniform transformation field analysis, reduced variables are deduced from a space–time decomposition of inelastic strain fields. Considering the macroscopic dissipation power as a volume-averaged microscopic dissipation power, closed-form macroscale constitutive relations are established, thus resulting into a reduced order homogenization problem. For ease of error estimate, we propose a multifield formulation for the reduced order problem. Based on duality techniques, a backward-in-time dual problem is derived from a Lagrange method, leading to error representations aiming at the quantity of interest. Accordingly, an efficient error estimator combined with a patch recovery technique avoiding nonlinear computations is proposed. The effectiveness of the resulting adaptive algorithm is illustrated by several numerical examples w.r.t. a prototype model.
Machine-learning convex and texture-dependent macroscopic yield from crystal plasticity simulations
2022, Materialia
The influence of the microstructure of a polycrystalline material on its macroscopic deformation response is still one of the major problems in materials engineering. For materials characterized by elastic-plastic deformation responses, predictive computational models to characterize crystal-plasticity (CP) have been developed. However, due to their large demand of computational resources, CP simulations cannot be straightforwardly implemented in hierarchical computational models such as FE $^{2}$ . This bottleneck intensifies the need for the development of macroscopic simulation tools that can be directly informed by microstructural quantities. Using a 3D finite element solver for CP, we generate a macroscopic yield function database based on general loading conditions and crystallographic texture. Leveraging the advancement in statistical modeling we describe and apply a machine learning framework for predicting plane stress macroscopic yield as a function of crystallographic texture. The convexity of the data-driven yield function is guaranteed by using partially input convex neural networks as the predictive tool. Furthermore, in order to allow for the predicted yield function to be directly incorporated in time-integration schemes, as needed for the finite element method, the yield surfaces are interpreted as the boundaries of signed distance function level sets. Results generated for an example cube texture are discussed.

View all citing articles on Scopus

View full text

GPU accelerated computational homogenization based on a variational approach in a reduced basis framework

Abstract

Introduction

Section snippets

Two-scale problem

Embedded GPU accelerated parts

General benchmarking methodology

Summary and conclusions

Acknowledgments

Computat. Mater. Sci.

Int. J. Solids Struct.

Comput. Methods Appl. Mech. Engrg.

Comput. Methods Appl. Mech. Engrg.

Composites Sci. Technol.

Comput. Methods Appl. Mech. Engrg.

Comput. Methods Appl. Mech. Engrg.

J. Mech. Phys. Solids

J. Comput. Phys.

J. Comput. Phys.

Int. J. Plast.

Composites Sci. Technol.

Composites Sci. Technol.

Int. J. Solids Struct.

Micromechanics: overall properties of heterogeneous materials

Finite Element Procedures

On transformation strains and uniform fields in multiphase elastic media

Proc. R. Soc. London A

Three-dimensional finite element implementation of the nonuniform transformation field analysis

Int. J. for Numer. Methods Eng.