Elsevier

Journal of Computational Physics

Volume 286, 1 April 2015, Pages 172-193
Journal of Computational Physics

Low-storage implicit/explicit Runge–Kutta schemes for the simulation of stiff high-dimensional ODE systems

https://doi.org/10.1016/j.jcp.2015.01.031Get rights and content

Abstract

Implicit/explicit (IMEX) Runge–Kutta (RK) schemes are effective for time-marching ODE systems with both stiff and nonstiff terms on the RHS; such schemes implement an (often A-stable or better) implicit RK scheme for the stiff part of the ODE, which is often linear, and, simultaneously, a (more convenient) explicit RK scheme for the nonstiff part of the ODE, which is often nonlinear. Low-storage RK schemes are especially effective for time-marching high-dimensional ODE discretizations of PDE systems on modern (cache-based) computational hardware, in which memory management is often the most significant computational bottleneck. In this paper, we develop and characterize eight new low-storage implicit/explicit RK schemes which have higher accuracy and better stability properties than the only low-storage implicit/explicit RK scheme available previously, the venerable second-order Crank–Nicolson/Runge–Kutta–Wray (CN/RKW3) algorithm that has dominated the DNS/LES literature for the last 25 years, while requiring similar storage (two, three, or four registers of length N) and comparable floating-point operations per timestep.

Introduction

Although a wide variety of methods have been used for spatial discretization and subgrid-scale modeling in the Direct Numerical Simulation (DNS) and Large Eddy Simulation (LES) of turbulent flows, time marching schemes for such systems have relied, in most cases, on an implicit scheme for the advancement of the stiff terms and an explicit scheme for the advancement of the nonstiff terms. Among these so-called IMEX schemes, an approach that gained favor due to [11] and [12] coupled the (implicit, second-order) Crank–Nicolson (CN) scheme for the stiff terms with the (explicit) second-order Adams–Bashforth (AB2) scheme for the nonstiff terms. This approach was refined in [13], which used the (implicit) CN scheme for the stiff terms, at each RK substep, together with the (explicit) third-order low-storage Runge–Kutta–Wray (RKW3) scheme [22] for the nonstiff terms. This venerable IMEX algorithm, dubbed CN/RKW3, still enjoys extensive use today, and is particularly appealing, as only two registers are required for advancing the ODE in time, though if three registers are used, the number of flops required by the algorithm may be significantly reduced. In high-dimensional discretizations of 3D PDE systems on modern computational hardware, the reduced memory footprint of this time marching algorithm, in its two-register or three-register form, can significantly reduce the execution time of a simulation. However, the CN/RKW3 scheme has the considerable disadvantage of being only second-order accurate, and its implicit part is only A-stable. In recent years, there have been relatively few attempts to refine the CN/RKW3 time-marching scheme for turbulence simulations, perhaps due to a mistaken notion that modifying it to achieve higher order might result in either increased storage requirements, significantly more computation per timestep, or the loss of A stability of the implicit part. It turns out that this is untrue; in fact, there is much to be gained by revising this algorithm.

When using an IMEX scheme, such as those described above, to march the incompressible Navier–Stokes equation, one natural choice is to treat the (linear) diffusion terms as the “stiff terms” and the (nonlinear) convective terms as the “nonstiff terms”. Note that a better choice for discretizations with significant grid clustering implemented in one or more spatial directions, as usually present when simulating wall-bounded turbulent flows, is to treat the diffusion and linearized convection terms with derivatives in the direction of most significant grid clustering (e.g., in the direction normal to the nearest wall) as the “stiff” terms, and the remaining terms as the “nonstiff” terms, as suggested by [1]. Note further that so-called fractional step methods are often combined with such IMEX schemes in order to enforce the incompressibility constraint (see, e.g., [13]). The present paper focuses exclusively on the IMEXRK part of such time-advancement algorithms; various creative choices for which terms to take implicitly at different points in the physical domain of interest, and various methods for implementing fractional step techniques for enforcing exactly the divergence-free constraint, may subsequently be addressed in an identical manner as discussed in [1] and [13], and elsewhere in the literature.

Over the last 30 years, there has been significant development of (full-storage) IMEXRK algorithms. A comprehensive review of this literature is given in [9], and a brief summary of this subject is given in Section 1.1 below, including the general structure of full-storage IMEXRK schemes, their general implementation, conditions on their parameters for second-, third-, and fourth-order accuracy, and characterizations of their stability.

Further, in the years since the development of RKW3 in [22], there has been significant development of alternative low-storage explicit RK schemes; a comprehensive review of this literature is given in [10], and a brief summary of this subject is given in Section 1.2 below, including the extension to implicit RK schemes, the introduction of a general 2-register IMEXRK form, efficient 3-register and 2-register implementations of this form, as well as the introduction of a general 3-register IMEXRK form, and efficient 4-register and 3-register implementations of this form.

We then develop eight new low-storage IMEXRK schemes well suited for turbulent flow simulations, and other computational grand challenge applications, using two, three, or four registers of length N (the dimension of the ODE under consideration). With an eye on the computational cost of their implementation, we focus on schemes with the smallest number of stages possible for a given order, stability, and storage requirement. A comprehensive summary of the schemes developed in this paper is given in Table 1. In short:

  • Section 2 presents two second-order, 2-register IMEXRK schemes:

    • the classic 3-stage, A-stable, CN/RKW3 scheme, and

    • a new, (2,3)-stage [that is, a scheme with 2 implicit stages and 3 explicit stages], L-stable, strong-stability-preserving scheme, dubbed IMEXRKCB2.

  • Section 3 presents five new third-order, 2-register IMEXRK schemes:

    • a (2,3)-stage, strongly A-stable scheme, dubbed IMEXRKCB3a,

    • a (3,4)-stage, strongly A-stable scheme with ESDIRK implicit part, dubbed IMEXRKCB3b, and

    • three (3,4)-stage, L-stable schemes:

      • one with coefficients selected to maximize stability of the ERK part on the negative real axis while being strong stability preserving, dubbed IMEXRKCB3c,

      • one with coefficients selected to be strong stability preserving for the maximum possible timestep, dubbed IMEXRKCB3d, and

      • one with coefficients selected to maximize accuracy of the ERK part, dubbed IMEXRKCB3e.

  • Section 4 presents a new third-order, 3-register, 4-stage, L-stable, stage-order-2 scheme dubbed IMEXRKCB3f.

  • Section 5 presents a new fourth-order, 3-register, 6-stage, L-stable, stage-order-2 scheme dubbed IMEXRKCB4.

In Section 6, we provide an analysis of the well-known order reduction phenomenon arising during the integration of very stiff ODEs using these IMEXRK schemes. Finally, Section 7 considers the application of all of these low-storage IMEXRK schemes, and some of their full-storage IMEXRK competitors, to a representative test problem in order to compare their computational efficiency.

A comprehensive review of (full-storage) IMEXRK schemes is given by Kennedy, Carpenter, and Lewis [9]. In short, IMEXRK schemes incorporate a coordinated pair of Diagonally Implicit Runge–Kutta (DIRK, with lower-triangular A) and Explicit Runge–Kutta (ERK, with strictly lower-triangular A) schemes, with parameters as summarized in the standard Butcher tableaux for the time advancement of an ODE of the formdx(t)dt=f(x,t)+g(x,t), where f(x,t) represents the stiff part of the RHS [advanced with the DIRK method at left in (1)], and g(x,t) represents the nonstiff part of the RHS [simultaneously advanced with the ERK method at right in (1)].

If the stiff part of the ODE is linear [that is, if f(x,t)=Ax] then, denoting the efficient solution of Ax=b as A1b, a full-storage implementation of the IMEXRK scheme in (1) to advance from x=xn to x=xn+1 proceeds as followsfork=1:sifk==1,y=x,else,y=x+i=1k1ak,iIMΔtfi+j=1k1ak,jEXΔtgj,endfk=A(Iak,kIMΔtA)1y[equivalently,fk=(Iak,kIMΔtA)1Ay]gk=g(y+akkIMΔtfk,tn+ckEXΔt)endxx+i=1sbiIMΔtfi+j=1sbjEXΔtgjxˆxˆ+i=1sbˆiIMΔtfi+j=1sbˆjEXΔtgj Line (3c) above is simply fk=f(z,tn+ckIMΔt), where z is the solution of e(z)=zyakkIMΔtf(z,tn+ckIMΔt)=0 [that is, where z=y+akkIMΔtf(z,tn+ckIMΔt)], in the special case that f(x,t)=Ax. More generally, if the stiff part f(x,t) is nonlinear, then line (3c) is replaced by a Newton–Raphson iteration (see [16]) to find the z such that e(z)=0:Initialize:z0=y+akkIMΔtf(y,tn+ckIMΔt)Iterate:(IakkIMΔtf(x,tn+ckIMΔt)x|x=zm)(zm+1zm)=zm+y+akkIMΔtf(zm,tn+ckIMΔt)Upon exit:fk=f(zconverged,tn+ckIMΔt)} The Jacobian used in this iteration may be computed analytically or approximated numerically. The low-storage IMEXRK algorithms developed in this work may be applied in the linear or nonlinear setting, mutatis mutandis; Sections 1.2.1 General three-register implementation of [2R] IMEXRK schemes, 1.2.2 General two-register implementation of [2R] IMEXRK schemes, 1.2.3 General four-register implementation of [3R] IMEXRK schemes, 1.2.4 General three-register implementation of [3R] IMEXRK schemes provide low-storage pseudocode implementations in the case in which the stiff part of the ODE is linear.

Finally, note that the bˆiIM and bˆiEX coefficients in the Butcher tableaux, if provided, are used to form a so-called embedded scheme to advance the solution at each timestep with an order of accuracy reduced by one with respect to the main scheme. Using this embedded scheme, one may estimate the error of the simulation at each timestep, and adjust the stepsize at the next iteration accordingly.

As is well known (see, e.g., [3]), for the DIRK and ERK components in (1), when used in isolation, to be first-order accurate, it is required thatτ1IM(1)=ibiIM1=0τ1EX(1)=ibiEX1=0, for these schemes, when used in isolation, to be second-order accurate, it is additionally required thatτ1IM(2)=ibiIMciIM1/2=0τ1EX(2)=ibiEXciEX1/2=0, for these schemes, when used in isolation, to be third-order accurate, it is additionally required thatτ1IM(3)=(1/2)ibiIMciIMciIM1/6=0τ1EX(3)=(1/2)ibiEXciEXciEX1/6=0τ2IM(3)=i,jbiIMaijIMcjIM1/6=0τ2EX(3)=i,jbiEXaijEXcjEX1/6=0, and for these schemes, when used in isolation, to be fourth-order accurate, it is additionally required thatτ1IM(4)=(1/6)ibiIMciIMciIMciIM1/24=0τ1EX(4)=(1/6)ibiEXciEXciEXciEX1/24=0τ2IM(4)=(1/3)i,jbiIMciIMaijIMcjIM1/24=0τ2EX(4)=(1/3)i,jbiEXciEXaijEXcjEX1/24=0τ3IM(4)=(1/2)i,jbiIMaijIMcjIMcjIM1/24=0τ3EX(4)=(1/2)i,jbiEXaijEXcjEXcjEX1/24=0τ4IM(4)=i,j,kbiIMaijIMajkIMckIM1/24=0τ4EX(4)=i,j,kbiEXaijEXajkEXckEX1/24=0. Recall that, in the scalar case, the exact solution of x=f(x)+g(x) has the following terms:xn+1=xn+Δtxn+(Δt)2xn/2!+(Δt)3xn/3!+O((Δt)4)=xn+Δt{f+g}(xn,tn)+(Δt)22!{ff+fg+gf+gg}(xn,tn)+(Δt)33!{fff+2ffg+fgg+gff+2gfg+ggg+fff+fgf+gff+ggf+ffg+fgg+gfg+ggg}(xn,tn)+O((Δt)4); note in particular that there are 2 terms at second order and 10 terms at third order that involve both f and g. For the DIRK and ERK components in (1), when used together in an IMEX fashion, to be second-order accurate, it is thus additionally required thatτ1IMEX(2)=ibiIMciEX1/2=0τ2IMEX(2)=ibiEXciIM1/2=0, for these schemes, when used together in an IMEX fashion, to be third-order accurate, it is additionally required thatτ1IMEX(3)=(1/2)ibiIMciEXciEX1/6=0τ2IMEX(3)=(1/2)ibiEXciIMciIM1/6=0τ3IMEX(3)=(1/2)ibiIMciIMciEX1/6=0τ4IMEX(3)=(1/2)ibiEXciIMciEX1/6=0τ5IMEX(3)=i,jbiIMaijEXcjEX1/6=0τ6IMEX(3)=i,jbiEXaijIMcjIM1/6=0τ7IMEX(3)=i,jbiEXaijEXcjIM1/6=0τ8IMEX(3)=i,jbiIMaijIMcjEX1/6=0τ9IMEX(3)=i,jbiIMaijEXcjIM1/6=0τ10IMEX(3)=i,jbiEXaijIMcjEX1/6=0, and for these schemes, when used together in an IMEX fashion, to be fourth-order accurate, 44 additional constraints are required (see [9]), which for brevity aren't listed here.

The stability of an RK scheme may be characterized by considering the model problem dx/dt=λx and defining z=λΔt, σ(z)=xn+1/xn, and σ()lim|z|σ(z). The stability function of an RK scheme with Butcher tableau parameters A and b is then given by σ(z)=1+zbT(IzA)11, where 1 denotes a vector of ones; the RK scheme is said to be stable for any z such that |σ(z)|1. Further, considering its application to stiff systems, an RK scheme is said to be

  • A-stable if |σ(z)|1 over the entire LHP of z,

  • strongly A-stable if it is A-stable and |σ()|<1, and

  • L-stable if it is A-stable and σ()=0.

The stability of an IMEXRK scheme is a bit more difficult to characterize. Of course, one may start by characterizing the stability of the implicit and explicit parts considered in isolation. To evaluate the stability of the implicit and explicit components of an IMEX scheme working together, we consider the model problem dx/dt=λfx+λgx, where the first term on the RHS is handled implicitly, and the second term on the RHS is handled explicitly. Defining zIM=λfΔt, zEX=λgΔt, and σ(zIM;zEX)=xn+1/xn, we may write (see [9])σ(zIM;zEX)=det[IzIMAIMzEXAEX+zIM1(bIM)T+zEX1(bEX)T]det[IzIMAIM]. We may then characterize the stability of the implicit and explicit parts of an IMEXRK scheme working in concert, when the implicit part of the problem is stiff, by looking at σ(zIM;zEX) as zIM for finite zEX.

Consider the 1D hyperbolic PDEu/t=f(u)/x; denoting by ui(t) the discretization of u(x,t) on N spatial grid points xi, and by u(t) a vector containing all of the ui(t), we write the spatial discretization of this PDE as the ODEdu/dt=L(u). If a TVD spatial discretization is used, such as a Godunov or MUSCL scheme with an appropriate flux limiter incorporated (see [14]), then applying a simple Explicit Euler time discretization to (7),un+1=un+ΔtL(un), under the appropriate CFL condition on the timestep, ΔtΔtCFL, results in a simulation exhibiting a total variation of the discrete solution which does not increase in time, that is,TV(un+1)TV(un),where TV(un)=j|uj+1nujn|. Strong-stability preserving (SSP) explicit time-discretization methods (see [17] and [18]) are simply higher-order time discretization methods that conserve this total variation diminishing property with a modified CFL condition on the timestep, ΔtcΔtCFL.

In [18] (see also [6]), a condition for an explicit Runge–Kutta scheme to be SSP has been developed. This condition states that if an s-stage explicit Runge–Kutta scheme is written in incremental form, that is,u(0)=unu(i)=j=0i1(αiju(j)+ΔtβijL(u(j)))fori=1,,sun+1=u(s), where all of the αij0, and if the forward Euler method applied to the ODE (7) arising from a TVD spatial discretization of the hyperbolic PDE (6) is strongly stable under the appropriate CFL restriction, then such an explicit Runge–Kutta method is SSP provided that all of the βij0 and that the following CFL restriction is fulfilled:ΔtcΔtCFL,c=mini,jαijβij. In case an explicit scheme is coupled with an implicit scheme, as in an IMEXRK formulation, then, provided the implicit scheme used to integrate the stiff part of the ODE is L-stable, in the stiff limit the time integration scheme becomes the explicit Runge–Kutta scheme, and the order of accuracy of the limiting scheme is greater than or equal to the order of accuracy of the IMEXRK scheme itself. Hence, as stated in [15], if the explicit part of the IMEXRK scheme is SSP, then the IMEXRK scheme will also be SSP in the stiff limit.

In [15], three full-storage second-order and two full-storage third-order IMEXRK schemes are presented which are SSP in the stiff limit; no other IMEXRK schemes with this SSP property were found in our review of the IMEXRK literature. The present paper derives three new IMEXRK schemes which are SSP in the stiff limit (one which is second-order and two which are third-order); unlike the schemes in [15], the IMEXRK schemes derived here are of the low-storage variety.

The existing literature on low-storage RK schemes to date appears to focus exclusively on explicit schemes. Note that a cavalier implementation of a full-storage ERK scheme [see the explicit part of (3a), (3b), (3c), (3d), (3e), (3f), (3g)] requires storage of the state vector [x], the intermediate vector [y], and s values of the RHS vectors [gk]; that is, s+2 vectors of length N, where x=xN×1. We now summarize the two main classes of low-storage ERK schemes,1 a comprehensive review of which is given in Kennedy, Carpenter, and Lewis [10].

The two-register Williamson class of ERK schemes [20], denoted “[2N]” schemes, may be written to advance from x=xn to x=xn+1 asfork=1:sifk==1,ΔxΔtg(x,tn+ckΔt),elseΔxαkΔx+Δtg(x,tn+ckΔt)endxx+βkΔxend If handled with care, such schemes can often be implemented efficiently in two registers of length N, x and Δx.

The two-register van der Houwen class of schemes [19], denoted “[2R]” schemes, restrict the parameters aij below the first subdiagonal in the Butcher tableau of the ERK scheme to be equal to the parameters bj of the corresponding column, and may thus be written to advance from x=xn to x=xn+1 asfork=1:sifk==1,yx,elseyx+(ak,k1bk1)Δtg(y,tn+ck1Δt)endxx+bkΔtg(y,tn+ckΔt)end Such schemes can often be implemented efficiently in two registers of length N (namely, x and y). If implemented with three registers, however, the function g(y,tn+ckΔt) can be computed just once per timestep (instead of twice). RKW3 [22] is a commonly-used example of a two-register, three-stage, third-order van der Houwen ERK scheme, with a Butcher tableau of

In the three-register van der Houwen class of schemes, denoted “[3R]” schemes, only the parameters aij below the second subdiagonal of the Butcher tableau of the ERK scheme must equal the parameters bj of the corresponding column. An effective implementation of such [3R] schemes that uses only three registers of length N (namely, x, y and z) is given byfork=1:sifk==1,yx,zx,else,zy+ak,k1Δtg(y,tn+ck1Δt)ifk<s,yx+(ak+1,k1bk1)g(y,tn+ck1Δt),endendxx+bkΔtg(y,tn+ckΔt)end Again, if implemented with four registers, the function g(y,tn+ckΔt) can be computed just once per timestep (instead of thrice). In the present work, we extend the two- and three-register van der Houwen classes of ERK schemes to the DIRK case, which can be accomplished with precisely the same restrictions on the (lower triangular) DIRK Butcher tableau as in the (strictly lower triangular) ERK case, as specified above. Further, we will develop coordinated pairs of such [2R] and [3R] DIRK and ERK schemes in the IMEX setting described in Section 1.1. In particular, we will develop a [2R] second-order IMEX scheme, [2R] and [3R] third-order IMEX schemes, and a [3R] fourth-order IMEX scheme.

As shown in Section 1.1, six constraints on the parameters of the IMEX Butcher tableaux (1) must be satisfied for second-order accuracy, fourteen additional constraints must be satisfied for third-order accuracy, and fifty-two additional constraints must be satisfied for fourth-order accuracy. Before proceeding, we thus introduce some significant simplifying assumptions. Following [15] and [9] and the CN/RKW3 scheme of [13], we synchronize the stages of DIRK and ERK components by imposing ciIM=ciEX=ci for i=1,,s. We also coordinate the constituent DIRK and ERK components such that biIM=biEX=bi for i=1,,s, as also done in [15] and [9], but which is not satisfied by CN/RKW3. Finally, for each stage, a stage-order of one is also imposed such thatj=1iaijIM=j=1i1aijEX=cifori=1,,s; it follows that c1=a11IM=a11EX=0. As a result of these assumptions, the number of constraints on the IMEX parameters [see (4a), (4b), (4c), (4d), (4e), (4f), (4g), (4h), (4i), (4j), (4k), (4l), (4m), (4n)] for second-order accuracy is reduced to just two, the number of constraints for third-order accuracy is reduced to five, and the number of constraints for fourth-order accuracy is reduced to fourteen.

For several of the IMEXRK schemes developed in this paper, a lower-order embedded scheme is also developed, relaxing the bˆiIM=bˆiEX restriction to provide increased freedom during the design phase. As a general guideline, none of the leading-order truncation terms of an embedded scheme should vanish, so that each of these terms will contribute to the error estimate (subject to this restriction, the remaining free parameters of the embedded scheme are then optimized to maximize the magnitude of the leading-order truncation terms). Unfortunately, this is not always achievable; as a result, not all of the schemes developed in this paper are listed with embedded schemes. For all of the embedded schemes we do report, the DIRK part of the embedded scheme is at least A-stable, which is a property of the embedded scheme recommended by [8]; note, however, that the embedded scheme is not used for time marching, it is only used to adjust the timestep.

The IMEX Butcher tableaux in (1) for coordinated pairs of [2R] DIRK and ERK schemes are thus simplified to and the IMEX Butcher tableaux for coordinated pairs of [3R] DIRK and ERK schemes are simplified to Note also that, as the DIRK component, the IMEXRK form considered above has an explicit first stage, its stability function (5) may be writtenσ(zIM;zEX)=1+i=1spi(zEX)[zIM]i1+i=1s1qi[zIM]iwhere pi(zEX)=j=0sipˆij[zEX]j.

Note that, if the stiff part of the ODE is linear [that is, if f(x,t)=Ax] then, denoting the efficient solution of Ax=b as A1b, a straightforward implementation of the low-storage IMEXRK scheme in (16) that uses three registers2 of length N to advance from x=xn to x=xn+1 proceeds as follows:fork=1:sifk==1,yx,elseyx+(ak,k1IMbk1IM)Δtz+(ak,k1EXbk1EX)Δtyendz=(Iak,kIMΔtA)1Ayyg(y+ak,kIMΔtz,tn+ckEXΔt)xx+bkIMΔtz+bkEXΔtyxˆxˆ+bˆkIMΔtz+bˆkEXΔtyend where z and y store the implicit and explicit parts of the RHS at each stage, x is used to advance the solution of the main scheme,3 and xˆ stores the solution of the embedded scheme if adaptive time stepping is implemented. Note that one linear solve of the form (IcA)1b, one matrix/vector product Ay, and one nonlinear function evaluation g(y,t) are computed per stage, in addition to various level-1 BLAS (basic linear algebra subroutine) operations. As discussed in Section 1.1, implementation in the case of a nonlinear stiff part is a straightforward extension.

By applying the matrix inversion lemma (Aˆ+BˆCˆDˆ)1=Aˆ1Aˆ1Bˆ(Cˆ1+DˆAˆ1Bˆ)1DˆAˆ1 with Aˆ=Cˆ=I, Dˆ=A, and B=ak,kIMΔt, the algorithm laid out in Section 1.2.1 may be rewritten in a form that only requires two registers2 of length N:fork=1:sifk==1,yx,elseyx+(ak,k1IMbk1IM)ΔtAy+(ak,k1EXbk1EX)Δtg(y,tn+ck1EXΔt)endy(Iak,kIMΔtA)1yxx+bkIMΔtAy+bkEXΔtg(y,tn+ckEXΔt)xˆxˆ+bˆkIMΔtAy+bˆkEXΔtg(y,tn+ckEXΔt)end In this case, one linear solve of the form (IcA)1b and two operations of the form4 x+cAy+dg(y,t) are computed per stage, in addition to various level-1 BLAS operations. However, the storage requirement is reduced from three registers of length N to only two, which is quite significant. In many cases, some of the coefficients in the above algorithm turn out to be zero, so the increased computational cost associated with the extra nonlinear function evaluations and matrix/vector products in this implementation is not as bad as one might initially anticipate, as quantified in Section 7.

For the development of the stage-order-two schemes IMEXRKCB3f and IMEXRKCB4 in Section 4 and Section 5, the [3R] IMEXRK structure (17) will be used to provide increased freedom during the design phase. Such schemes admit the following four-register implementation:fork=1:sifk==1,yx,zIM=x,zEXx,elsezEXy+ak,k1EXΔtzEXifk<s,yx+(ak+1,k1IMbk1IM)ΔtzIM+(ak+1,k1EXbk1EX)(zEXy)/ak,k1EX,endzEXzEX+ak,k1IMΔtzIMendzIM=(Iak,kIMΔtA)1AzEXzEXg(zEX+ak,kIMΔtzIM,tn+ckEXΔt)xx+bkIMΔtzIM+bkEXΔtzEXxˆxˆ+bˆkIMΔtzIM+bˆkEXΔtzEXend where zIM and zEX store the implicit and explicit parts of the RHS at each stage, y is a temporary variable which contributes to advance the solution to the next stage, x is used to advance the solution of the main scheme, and xˆ stores the solution of the embedded scheme if adaptive timestepping is used. As in the three-register implementation of the [2R] scheme, only one linear solve of the form (IcA)1b, one matrix/vector product, and one nonlinear function evaluation are computed per stage.

Leveraging matrix inversion lemma as done in Section 1.2.2, we obtain a general three-register implementation of any [3R] IMEXRK scheme:fork=1:sifk==1,yx,zx,elseifk<szy+ak,k1IMΔtAzyA1(zy)/(ak,k1IMΔt)zz+ak,k1EXΔtg(y,tn+ck1EXΔt)yx+(ak+1,k1IMbk1IM)ΔtAy+(ak+1,k1EXbk1EX)Δtg(y,tn+ck1EXΔt)elsezy+ak,k1IMΔtAz+ak,k1EXΔtg(y,tn+ck1EXΔt)endendz(Iak,kIMΔtA)1zxx+bkIMΔtAz+bkEXΔtg(z,tn+ckEXΔt)xˆxˆ+bˆkIMΔtAz+bˆkEXΔtg(z,tn+ckEXΔt)end Note that this algorithm requires the invertibility of the matrix A, a condition that is often true when A arises from the discretization of a PDE. In this case, two linear systems, three matrix/vector products, and three nonlinear function evaluations must be performed per stage (except for the last stage), plus an additional matrix/vector product and one nonlinear function evaluation if the embedded scheme is used for adaptive timestepping.

Finally, note that a (hardware-dependent) trade-off between flops and storage must ultimately be conducted to select between the two-register and three-register implementation of any [2R] scheme, or between the three-register and four-register implementation of any [3R] scheme.

Section snippets

Two second-order, 2-register IMEXRK schemes

The classical second-order, A-stable CN/RKW3 method may easily be written in the low-storage IMEXRK Butcher tableaux form (16) (albeit with the biIM=biEX=bi constraint relaxed) with the four-stage IMEX Butcher tableaux A DIRK scheme with c1=0 and cs=1 [such as that shown at left in (23)] is known as a first-same-as-last (FSAL) scheme. In such a scheme, the implicit part of the last stage of one timestep is precisely the implicit part of the first stage of the next timestep, and thus an FSAL

A (2,3)-stage, strongly A-stable scheme

As suggested by (24), to streamline the implementation, we can suppress the first stage of the DIRK scheme by imposing b1=a21IM=0. Following this simplification, the entire first column of the DIRK scheme is zero, thus leading to a scheme with s1 implicit stages and s explicit stages. In the s=3 case, the IMEXRK Butcher tableaux take the general form To achieve third-order accuracy, after imposing stage-order-one conditions on both implicit and explicit part, we arrive at five nonlinear

A third-order, 3-register, 4-stage, L-stable scheme

All of the schemes so-far described have stage-order one for both the implicit and explicit components. It is well known in the literature (see [7]) that this limits the order of convergence of such methods when applied to stiff ODEs. In particular, if the stiffness is so high that the ODE turns into an index-1 DAE, some variables convert from differential to algebraic and their convergence rate is determined by the stage-order of the method. For this reason, an attempt has been made to improve

A fourth-order, 3-register, 6-stage, L-stable scheme

Solving the nonlinear system of equations arising from the imposition of the fourth-order accuracy constraints is a difficult task. For this reason, stage-order conditions higher than one are usually imposed, as pointed out in [8]. These conditions simplify the search for a solution by significantly reducing the nonconvexity of the corresponding optimization problem. For this reason, after imposing the same bi and ci over the explicit and implicit components and stiff accuracy for the implicit

Order reduction

We now consider the order reduction present when the schemes developed above are applied to the van der Pol equation. It is well documented in the literature (see, e.g., [8]) that whenever an RK method is used to integrate a singular perturbation problem (that is, an ODE characterized by a stiffness parameter ε whose behavior transitions towards that of an index-1 DAE as the stiffness increases), the observed convergence rate appears to be lower than the nominal order of accuracy of the RK

Computational cost

To illustrate the relative computational cost of our new low-storage IMEXRK schemes on a representative PDE model problem discretized on N1 gridpoints, we now compare the efficient implementation of each of the methods developed herein to CN/RKW3 and several full-storage IMEX Runge–Kutta schemes available in literature. We consider as a model PDE problem the one-dimensional Kuramoto–Sivashinsky equationut=uux2ux24ux4 over the domain x[L/2,L/2] with u=u/x=0 at x=±L/2, where L is

Conclusions

We have developed eight new IMEX Runge–Kutta schemes with reduced storage requirements, the properties of which are succinctly summarized and compared with competing schemes in Table 1. It is seen that:

  • IMEXRKCB2 is second-order accurate, like CN/RKW3; IMEXRKCB3a–3f are third-order accurate, and IMEXRKCB4 is fourth-order accurate.

  • IMEXRKCB2 and 3a–3e, like CN/RKW3, admit both two-register and three-register implementations, with the three-register implementations requiring slightly fewer flops.

Acknowledgements

The authors would like to thank one of the reviewers for the especially valuable comments and suggestions, which contributed to improve the results of the present work. The authors also gratefully acknowledge the financial support of AFOSR FA9550-12-1-0046 and NSF CNS-1035828.

References (22)

  • M.H. Carpenter et al.

    Fourth-order Runge–Kutta schemes for fluid mechanics applications

    J. Sci. Comput.

    (2005)
  • Cited by (50)

    • An assessment of implicit-explicit time integrators for the pseudo-spectral approximation of Boussinesq thermal convection in an annulus

      2022, Journal of Computational Physics
      Citation Excerpt :

      Schemes located in the top right quadrant of each panel are more accurate and yet more efficient than CNAB2. Six IMEX-RK schemes are systematically located in this quadrant, one scheme of order 2, SMR432 [29], four schemes of order 3, ARS343 [17], CB443 [32], CFN343 [58], KC443 [51] and one scheme of order 4, KC664 [51]. At the operational, dissipation-based limit of stability, it is remarkable to note that ARS343 and KC664 enable a significant improvement of accuracy at a lower cost, regardless of the configuration.

    View all citing articles on Scopus
    View full text