nach oben

BIT Numerical Mathematics

Erschienen in:

Open Access 07.02.2022

Order theory for discrete gradient methods

verfasst von: Sølve Eidnes

Erschienen in: BIT Numerical Mathematics | Ausgabe 4/2022

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Patentsuche

Aus

Abstract

The discrete gradient methods are integrators designed to preserve invariants of ordinary differential equations. From a formal series expansion of a subclass of these methods, we derive conditions for arbitrarily high order. We derive specific results for the average vector field discrete gradient, from which we get P-series methods in the general case, and B-series methods for canonical Hamiltonian systems. Higher order schemes are presented, and their applications are demonstrated on the Hénon–Heiles system and a Lotka–Volterra system, and on both the training and integration of a pendulum system learned from data by a neural network.

Communicated by Christian Lubich.

The research was supported by the Research Council of Norway, through the projects SPIRIT (No. 231632) and BigDataMine (No. 309691).

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Energy preservation and discrete gradient methods

For an ordinary differential equation (ODE)

$$\begin{aligned} \dot{x} = f(x), \quad x \in \mathbb {R}^d, \quad f : \mathbb {R}^d \rightarrow \mathbb {R}^d, \end{aligned}$$

(1)

a first integral, or invariant, is a function $H : \mathbb {R}^d \rightarrow \mathbb {R}$ such that $H(x(t)) = H(x(t_0))$ along the solution curves of (1). If we can write

$$\begin{aligned} f(x) = S(x)\nabla H(x), \end{aligned}$$

(2)

where $S(x) : \mathbb {R}^{d \times d} \rightarrow \mathbb {R}^d$ is a skew-symmetric matrix, then (1) preserves H: this follows from the skew-symmetry of S(x), which yields

$$\begin{aligned} \frac{\mathrm {d}}{\mathrm {d}t} H(x) = \nabla H(x)^T \dot{x} = \nabla H(x)^T S(x) \nabla H(x) = 0. \end{aligned}$$

(3)

The converse is also true: McLachlan et al. showed in [30] that, whenever (1) has a first integral H, there exists a skew-symmetric matrix S(x), bounded near every non-degenerate critical point of H, such that (1) can be written on what is called the skew-gradient form:

$$\begin{aligned} \dot{x} = S(x) \nabla H(x). \end{aligned}$$

(4)

The proof provided in [30] for this is based on presenting a general form of one such S(x), the so-called default formula

$$\begin{aligned} S(x) = \frac{f(x) \nabla H(x)^T - \nabla H(x) f(x)^T}{\nabla H(x)^T \nabla H(x)}. \end{aligned}$$

(5)

Unless $d=2$, this is generally not a unique choice of S(x), as e.g.

$$\begin{aligned} S(x) = \frac{f(x) g(x)^T - g(x) f(x)^T}{g(x)^T \nabla H(x)} \end{aligned}$$

will satisfy (2) for any non-vanishing function $g : \mathbb {R}^d \rightarrow \mathbb {R}^d$. Many ODEs with first integrals have a well-known skew-gradient form (4). This includes Poisson systems, and the important class consisting of canonical Hamiltonian ODEs. For the latter, S will be constant, so that we may write

$$\begin{aligned} \dot{x} = S \nabla H(x). \end{aligned}$$

(6)

A numerical integrator preserving a first integral H exactly is called an integral-preserving, or energy-preserving, method. Starting in the late 1970s, a few energy-preserving methods were proposed which relied on some discrete analogue of the property (3), see e.g. [9, 23, 25, 26]. Most prominent among these is the class of methods called discrete gradient methods, defined formally by Gonzalez in [18] and given their current name in [30].

Given the first integral H, a discrete gradient $\overline{\nabla } H : \mathbb {R}^d \times \mathbb {R}^d \rightarrow \mathbb {R}^d$ is a function satisfying the conditions

$$\begin{aligned} \overline{\nabla } H(x,y)^\text {T} (y-x)&= H(y) - H(x), \end{aligned}$$

(7)

$$\begin{aligned} \overline{\nabla } H(x,x)&= \nabla H(x), \end{aligned}$$

(8)

for all $x,y\in \mathbb {R}^d$. Introducing also the discrete approximation $\overline{S}(x,y,h)$ to S(x), skew-symmetric and satisfying $\overline{S}(x,x,0) = S(x)$, the corresponding discrete gradient method is given by

$$\begin{aligned} \frac{\hat{x}-x}{h} = \overline{S}(x,\hat{x},h) \overline{\nabla }H(x,\hat{x}). \end{aligned}$$

(9)

This scheme satisfies a discrete analogue to (3):

$$\begin{aligned} H(\hat{x}) - H(x) = h \overline{\nabla }H(x,\hat{x})^T \overline{S}(x,\hat{x},h) \overline{\nabla }H(x,\hat{x}) = 0. \end{aligned}$$

We say that (9) is consistent to the skew-gradient system (4), since $\overline{S}(x,\hat{x},h)$ is a consistent approximation of S(x) and $\overline{\nabla }H(x,\hat{x})$ is a consistent approximation of $\nabla H(x)$.

If $d\ge 2$, there are in general infinitely many functions satisfying (7)–(8). Many explicit definitions of concrete discrete gradients have been suggested, and we will discuss the most prominent among them in Sect. 2.1. One of these is the average vector field (AVF) discrete gradient, first introduced in [22] and sometimes called the mean value discrete gradient [30]. For a given H, it is given by the average of $\nabla H$ on the segment $[x,y]$:

$$\begin{aligned} \overline{\nabla }_{\text {AVF}} H(x,y) = \int _0^1 \nabla H ((1-\xi )x + \xi y) \, \mathrm {d}\xi . \end{aligned}$$

(10)

When applied to the constant S system (6), the discrete gradient method with $\overline{S}(x,y,h) = S$ and $\overline{\nabla }H = \overline{\nabla }_{\text {AVF}} H$ coincides with the scheme

$$\begin{aligned} \frac{\hat{x}-x}{h} = \int _0^1 f ((1-\xi )x + \xi \hat{x}) \, \mathrm {d}\xi . \end{aligned}$$

(11)

This is sometimes viewed as a method by itself, applicable to any system (1), in which case it is called the average vector field (AVF) method [37]. This was shown in [6] to be a B-series method.

As pointed out in [30], the discrete gradient is restricted by its definition to be at best a second order approximation to point values of $\nabla {H}$. In much of the literature on discrete gradient methods, see e.g. [18, 21], the approximation $\overline{S}$ is defined as being independent of h. In that case, the discrete gradient scheme (9) can at best guarantee second order convergence towards the exact solution. Over the last two decades, there have been published some notable papers on higher order discrete gradient methods. McLaren and Quispel were first out with their bootstrapping technique derived in [31, 32]. Given any discrete gradient $\overline{\nabla }H$ and an approximation to S(x) given by $\overline{S}(x,y,h)$, they compare the Taylor expansion of the corresponding discrete gradient scheme to that of the exact solution, and thus find a new approximation $\tilde{S}(x,y,h)$ to S(x) which yields higher order. This quickly becomes a very involved procedure, but by using a symmetric discrete gradient, they derive fourth order methods. A downside of this method is that the schemes of order higher than two require the calculation of tensors of order three or higher at every time step.

A fourth order generalization of the AVF method is proposed by the same authors in [37]. This can be viewed as a fourth-order discrete gradient method for all skew-gradient systems where S is constant. Also worth mentioning in this setting is the collocation-like method introduced by Hairer [20] and then generalized to Poisson systems by Cohen and Hairer [10]. This is a multi-stage extension of the AVF discrete gradient method. To get higher than second order, more than one stage is required. In that case the method is not a discrete gradient method, although it is energy-preserving.

Norton et al. show in [35] that linear projection methods can be viewed as a class of discrete gradient methods for skew-gradient systems with S(x) given by the default formula (5). In connection to this, Norton and Quispel suggest in [36] the class of approximations to (5) given by

$$\begin{aligned} \overline{S}(x,y,h) = \frac{\tilde{f}(x,y,h) \tilde{g}(x,y,h)^T - \tilde{g}(x,y,h) \tilde{f}(x,y,h)^T}{\hat{g}(x,y,h)^T \breve{g}(x,y,h)}, \end{aligned}$$

(12)

where $\tilde{f}(x,y,h)$ is a consistent approximation to f(x), and $\tilde{g}(x,y,h), \hat{g}(x,y,h)$ and $\breve{g}(x,y,h)$ are all consistent approximations to $\nabla H(x)$. The corresponding discrete gradient method then inherits the order of the method $\hat{x}= x + h \tilde{f}(x,\hat{x},h)$.

The use of the discrete gradient method expands beyond the numerical integration of ODEs. It can be applied to the time-integration of partial differential equations (PDEs) which has constants of motion, guaranteeing preservation of a discrete approximation of an integral [5, 14, 16]. Higher-order methods have been studied in this context, but then only schemes that are expanding on the AVF method [3, 24]. Furthermore, building on the recently introduced Hamiltonian neural networks [8, 19], Matsubara et al. have shown how discrete gradients can be used to learn energy-preserving systems from data [27]. Since the energy is given by a neural network, none of the higher-order methods reviewed above are applicable in that case, and the authors suggest to use multi-step methods to get higher than second order.

To the best of our knowledge, no one has so far suggested higher than fourth order discrete gradient methods for a general skew-gradient system (4). Furthermore, for this general case, all discrete gradient methods suggested of higher than second order involve tensors of order three or higher. In this paper we present general theory for achieving methods of arbitrary order, using any discrete gradient and not depending on tensors of order higher than two. Largely inspired by the above mentioned references, especially [31, 32, 37], we present here a general form giving a class of approximations $\overline{S}(x,y,h)$ to any S(x) in (4), with corresponding conditions for achieving an arbitrary order of the discrete gradient method (9). We do this step by step. In the next section, we derive some useful properties of a general discrete gradient and discuss the most common specific discrete gradients. Then we consider the AVF method and use order theory for B-series methods to obtain a generalization of this, with corresponding order conditions. In Sect. 4, we build on this to develop higher order discrete gradient methods for a general skew-gradient system, using the AVF discrete gradient. Then, in Sect. 5, we generalize this further to allow for a free choice of the discrete gradient, thus arriving at the general form $\overline{S}(x,y,h)$ mentioned above, and a formal series expansion of the corresponding discrete gradient methods. Throughout the paper, we present several examples of higher order schemes for the different cases. In the final section, we apply some of these schemes to numerical examples: the Hénon–Heiles system with a constant S, a Lotka–Volterra system with a non-constant S, and a pendulum system learned by a neural network.

2 A preliminary analysis of discrete gradients

To simplify notation in the following derivations, we define $g:=\nabla H$. Furthermore, we suppress the first argument of $\overline{\nabla }H$ and define $\bar{g}(y) := \overline{\nabla }H(x,y)$. We use Einstein summation convention, with a comma in the subscript to differentiate between components of covectors and derivatives. That is, we write $\bar{g}(y)^i_{,j} := \frac{\partial \bar{g}(y)^i}{\partial y^j}$, $\bar{g}(y)_{i,j} := \frac{\partial \bar{g}(y)_i}{\partial y^j}$ and so forth. Taylor expanding $\bar{g}(y)$ around x, we get

$$\begin{aligned} \begin{aligned} \bar{g}(y)^i =&\, \bar{g}(x)^i + \bar{g}(x)^i_{,j}(y^j-x^j) + \frac{1}{2}\bar{g}(x)^i_{,j k}(y^j-x^j)(y^k-x^k) \\&+ \frac{1}{6}\bar{g}(x)^i_{,j k l}(y^j-x^j)(y^k-x^k)(y^l-x^l) + \mathcal {O}(|y-x |^4), \end{aligned} \end{aligned}$$

(13)

$$\begin{aligned} \bar{g}(y) =&\sum _{\kappa =0}^\infty \frac{1}{\kappa !} \bar{g}^{(\kappa )}(x)(y-x)^\kappa . \end{aligned}$$

(14)

By the consistency criterion (8), we have $\bar{g}(x) = g(x)$. However, if we require the discrete gradient to be a differentiable function in its second argument, (8) follows directly from (7). To see this, we write (7) as

$$\begin{aligned} H(y)-H(x) = \bar{g}(y)_i (y^i - x^i). \end{aligned}$$

(15)

Differentiating this with respect to $y^j$, we get

$$\begin{aligned} g(y)_j = H(y)_{,j} = \bar{g}(y)_{i,j} (y^i - x^i) + \bar{g}(y)_j, \end{aligned}$$

(16)

The case $y= x$ immediately gives $g(x)_j = \bar{g}(x)_j$, or (8). Assuming further that $\overline{\nabla }H \in C^2(\mathbb {R}^d \times \mathbb {R}^d,\mathbb {R}^d)$, we can differentiate once more to get

$$\begin{aligned} g(y)_{j,k} = H(y)_{,jk} = \bar{g}(y)_{i,jk} (y^i - x^i) + \bar{g}(y)_{j,k} + \bar{g}(y)_{k,j}, \end{aligned}$$

(17)

which means that

$$\begin{aligned} g(x)_{j,k} = H(x)_{,jk} = \bar{g}(x)_{j,k} + \bar{g}(x)_{k,j}, \end{aligned}$$

$$\begin{aligned} D^2 H(x) = D_{2} \overline{\nabla }H(x,x) + (D_{2} \overline{\nabla }H(x,x))^\text {T}, \end{aligned}$$

(18)

where $D^2 H :=D\nabla H$ denotes the Hessian of H, and $D_{2} \overline{\nabla }H$ denotes the Jacobian of $\overline{\nabla }H$ with respect to its second argument.

Lemma 1

If the discrete gradient $\overline{\nabla }H$ is symmetric, i.e. $\overline{\nabla }H(x,y) = \overline{\nabla }H(y,x)$ for all $x, y \in \mathbb {R}^d$, then

$$\begin{aligned} D_{2} \overline{\nabla }H(x,x) = \frac{1}{2}D^2 H(x). \end{aligned}$$

(19)

Proof

Disclosing the suppressed argument x in (16), we have

$$\begin{aligned} g(y)_j = \frac{\partial }{\partial y^j}(\bar{g}(x,y)_i) (y^i - x^i) + \bar{g}(x,y)_j, \end{aligned}$$

which we can differentiate by $x^k$ to get

$$\begin{aligned} 0 = \frac{\partial ^2}{\partial x^k\partial y^j}(\bar{g}(x,y)_i) (y^i - x^i) - \frac{\partial }{\partial y^j}\bar{g}(x,y)_k + \frac{\partial }{\partial x^k}\bar{g}(x,y)_j. \end{aligned}$$

If $\overline{\nabla }H$ is symmetric,

$$\begin{aligned} \frac{\partial }{\partial x^k} \bar{g}(x,y)_j = \frac{\partial }{\partial x^k} \bar{g}(y,x)_j. \end{aligned}$$

Thus, for $y= x$ we get $\bar{g}(x)_{k,j} = \bar{g}(x)_{j,k}$, or $(D_{2} \overline{\nabla }H(x,x))^\text {T} = D_{2} \overline{\nabla }H(x,x)$. Inserting that in (18), we obtain (19).

Definition 1

Given a discrete gradient $\overline{\nabla }H \in C^1(\mathbb {R}^d \times \mathbb {R}^d,\mathbb {R}^d)$, we define the function $Q : \mathbb {R}^d \times \mathbb {R}^d \rightarrow \mathbb {R}^{d\times d}$ by

$$\begin{aligned} Q(x,y) :=\frac{1}{2}\left( (D_2\overline{\nabla }H(x,y))^T - D_2\overline{\nabla }H(x,y) \right) . \end{aligned}$$

(20)

Note that Q(x, y) is a skew-symmetric matrix. From (18), we see that $Q(x,x) = \frac{1}{2}D^2 H(x) - D_2\overline{\nabla }H(x,x)$. The following result is crucial for the the main result of this paper: the development of a general theory for higher order discrete gradient methods.

Lemma 2

For a discrete gradient $\overline{\nabla }H \in C^{p}(\mathbb {R}^d \times \mathbb {R}^d,\mathbb {R})$ and the corresponding Q given by (20),

$$\begin{aligned} D_2^{\kappa } \overline{\nabla }H(x,x)v^\kappa= & {} \frac{1}{\kappa +1}D^{\kappa }\nabla H(x)v^\kappa \\&- \frac{2\kappa }{\kappa +1} D_2^{\kappa -1} Q(x,x) v^\kappa \quad \text { for any } \kappa \in [1,p], v \in \mathbb {R}^d. \end{aligned}$$

Proof

Differentiating (17) $\kappa -1$ times by $y$ and setting $y= x$, we find that the $\kappa $th derivatives of g(x) can be expressed by the $\kappa $th derivatives of $\bar{g}(x)$ through the relation

$$\begin{aligned} g(x)_{j,I} = \bar{g}(x)_{j,I} + \sum _{m=1}^\kappa \bar{g}(x)_{i_m, \left\{ j, I_m\right\} }, \quad \text { for all } j,I,\kappa , \end{aligned}$$

(21)

where $I = \left\{ i_1,i_2,\dots ,i_\kappa \right\} $ is an ordered set of $\kappa $ indices, and

$$\begin{aligned} I_m = I \setminus \left\{ i_m\right\} = \left\{ i_1,i_2,\ldots i_{m-1},i_{m+1},\ldots ,i_\kappa \right\} , \end{aligned}$$

i.e. I with the mth element excluded. Similarly, by continued differentiation of (20), we obtain

$$\begin{aligned} (D_2^{\kappa -1} Q(x,x))_{j,I} = \frac{1}{2}\bar{g}(x)_{i_1, \left\{ j, I_1\right\} } - \frac{1}{2}\bar{g}(x)_{j,I}. \end{aligned}$$

Thus

$$\begin{aligned} (D_2^{\kappa -1}&Q(x,x) v^\kappa )_j = (D_2^{\kappa -1} Q(x,x))_{j,I} v^I = \frac{1}{2}\bar{g}(x)_{i_1,\left\{ j, I_1\right\} }v^{\left\{ j, I_1\right\} } -\frac{1}{2}\bar{g}(x)_{j,I}v^I\\&=\frac{1}{2\kappa } \sum _{m=1}^\kappa \bar{g}(x)_{i_m,\left\{ j, I_m\right\} }v^{\left\{ j, I_m\right\} } + \frac{1}{2\kappa } \bar{g}(x)_{j,I}v^I- \frac{1}{2\kappa } \bar{g}(x)_{j,I}v^I -\frac{1}{2}\bar{g}(x)_{j,I}v^I\\&=\frac{1}{2\kappa } g(x)_{j,I}v^I - (\frac{1}{2\kappa } +\frac{1}{2})\bar{g}(x)_{j,I}v^I = \frac{1}{2\kappa } g(x)_{j,I}v^I - \frac{\kappa +1}{2\kappa }\bar{g}(x)_{j,I}v^I. \end{aligned}$$

2.1 Properties of different discrete gradients

While introducing the discrete gradient methods in [18], Gonzalez also gave an example of a discrete gradient satisfying (7)–(8): the midpoint discrete gradient is given by

$$\begin{aligned} \overline{\nabla }_\text {M} H(x,y) :=\nabla H\left( \frac{x+y}{2}\right) + \frac{H(y)-H(x)-\nabla H\left( \frac{x+y}{2}\right) ^T\left( y-x\right) }{(y-x)^T(y-x)}\left( y-x\right) . \end{aligned}$$

Even when H is analytic, this discrete gradient is often not; the second order partial derivatives are in general singular in $y=x$. For that reason, it is not suited for achieving higher order methods by the techniques we consider in this paper.

The Itoh–Abe discrete gradient method, introduced in [23], notably does not require evaluation of the gradient. The corresponding discrete gradient, which has also been called the coordinate increment discrete gradient [30], is defined by

$$\begin{aligned} \overline{\nabla }_\text {IA}H(x,y) :=\sum _{j=1}^d \alpha _j e_j, \end{aligned}$$

(22)

where $e_j$ is the $j^{\text {th}}$ canonical unit vector and

$$\begin{aligned} \alpha _j&= {\left\{ \begin{array}{ll} \dfrac{H(w_j) - H(w_{j-1})}{y^j-x^j} &{} \quad \text {if } y^j \ne x^j,\\ \frac{\partial H}{\partial x^j}(w_{j-1}) &{} \quad \text {if } y^j = x^j, \end{array}\right. }\\ w_j&= \sum \nolimits _{i=1}^j y^i e_i + \sum \nolimits _{i=j+1}^n x^i e_i. \end{aligned}$$

While the other discrete gradients we consider in this paper are symmetric and thus second order approximations to $\nabla H$, the Itoh–Abe discrete gradient is only of first order. However, a second order discrete gradient, which we call the symmetrized Itoh–Abe (SIA) discrete gradient, is given by

$$\begin{aligned} \overline{\nabla }_\text {SIA} H (x,y) :=\frac{1}{2} \left( \overline{\nabla }_\text {IA}H(x,y)+\overline{\nabla }_\text {IA}H(y,x)\right) . \end{aligned}$$

(23)

Furihata presented the discrete variational derivative method for a class of PDEs in [16], a method which has been developed further by Furihata, Matsuo and co-authors in a series of papers, e.g. [29, 38], as well as the monograph [17]. As shown in [14], these schemes can also be obtained by semi-discretizing the PDE in space and then applying a discrete gradient method on the resulting system of ODEs. We give here the definition of the specific discrete gradient that gives the schemes of Furihata and co-author, defined for a class of invariants that includes all polynomial functions:

Definition 2

Assume that we can write the first integral as

$$\begin{aligned} H(x) = \sum _l c_l \prod _{j=1}^d f^l_{j}(x^j), \end{aligned}$$

(24)

for functions $f^l_{j} : \mathbb {R} \rightarrow \mathbb {R}$. The Furihata discrete gradient $\overline{\nabla }_\text {F} H(x,y)$ is defined by

$$\begin{aligned} \overline{\nabla }_\text {F}H(x,y) :=\sum _{j=1}^d \alpha _j e_j, \end{aligned}$$

(25)

where $e_j$ is the $j^{\text {th}}$ canonical unit vector and

$$\begin{aligned} \alpha _j&= {\left\{ \begin{array}{ll} \sum \limits _l \frac{c_l}{2} \frac{f_j^l(y^j)-f_j^l(x^j)}{y^j-x^j} \left( \prod \limits _{k=1}^{j-1}{f_k^l(x^k)} + \prod \limits _{k=1}^{j-1}{f_k^l(y^k)}\right) \prod \limits _{k=j+1}^{d}{\frac{f_k^l(x^k)+f_k^l(y^k)}{2}} &{} \quad \text {if } y^j \ne x^j,\\ \sum \limits _l \frac{c_l}{2} \frac{\mathrm {d}f^l_{j}(x^j)}{\mathrm {d}x^j} \left( \prod \limits _{k=1}^{j-1}{f_k^l(x^k)} + \prod \limits _{k=1}^{j-1}{f_k^l(y^k)}\right) \prod \limits _{k=j+1}^{d}{\frac{f_k^l(x^k)+f_k^l(y^k)}{2}} &{} \quad \text {if } y^j = x^j. \end{array}\right. } \end{aligned}$$

The discrete gradient introduced by Matsubara and co-authors in [27] shares some relation to the Furihata discrete gradient, in that they are both relying on a discrete analogue to the product rule for derivatives. But the former discrete gradient, obtained by what the authors call the automatic discrete differentiation algorithm, more importantly depends on an analogue to the chain rule. By using this algorithm instead of standard automatic differentiation in a neural network that learns a dynamical system from data, the discrete gradient is obtained together with the preserved energy. Thus the learned system can be integrated in an energy-preserving manner, which distinguishes the approach of Matsubara et al. from comparable studies [8, 19, 39].

Lastly we consider the AVF discrete gradient (10), which has several noteworthy characteristics.

Lemma 3

The Q(x, y) corresponding to the AVF discrete gradient is the zero matrix, since $(D_{2} \overline{\nabla }_{\text {AVF}} H (x,y))^\text {T} = D_{2} \overline{\nabla }_{\text {AVF}} H(x,y)$.

Proof

For $\bar{g}(y) :=\overline{\nabla }_{\text {AVF}} H(x,y)$, we have

$$\begin{aligned} \bar{g}(y)_{i,j}&= \frac{\partial }{\partial y^j} \int _0^1 g((1-\xi )x + \xi y)_i \, \mathrm {d}\xi = \int _0^1 \frac{\partial }{\partial y^j} g((1-\xi )x + \xi y)_i \, \mathrm {d}\xi \\&= \int _0^1 \xi g((1-\xi )x + \xi y)_{i,j} \, \mathrm {d}\xi = \int _0^1 \xi g((1-\xi )x + \xi y)_{j,i} \, \mathrm {d}\xi = \bar{g}(y)_{j,i}. \end{aligned}$$

From the Integrability Lemma (see e.g. [21, Lemma VI.2.7]) and the above, we have that the AVF discrete gradient defines a gradient vector field:

Corollary 1

The AVF discrete gradient is the gradient with respect to the second argument of a function $\tilde{H}(x,y)$. That is,

$$\begin{aligned} \overline{\nabla }_{\text {AVF}} H(x,y) = \nabla _2 \tilde{H} (x,y), \end{aligned}$$

for some $\tilde{H} : \mathbb {R}^d \times \mathbb {R}^d \rightarrow \mathbb {R}$ and all $x, y \in \mathbb {R}^d$.

Any other discrete gradient will have these properties only for functions H for which it coincides with the AVF discrete gradient:

Proposition 1

The AVF discrete gradient is the unique discrete gradient satisfying $(D_{2} \overline{\nabla }H (x,y))^\text {T} = D_{2} \overline{\nabla }H(x,y)$ for all H, x and y, and it has the formal expansion

$$\begin{aligned} \overline{\nabla }_\text {AVF} H(x,y) = \sum _{\kappa =0}^\infty \frac{1}{(\kappa +1)!} D^{\kappa }\nabla H(x)(y-x)^\kappa . \end{aligned}$$

(26)

Proof

Assume that $\nabla H$ is an analytic function. As in the proof of Lemma 2, let $I = \left\{ i_1,i_2,\dots ,i_\kappa \right\} $ be an ordered set of $\kappa $ indices, and let $I_m$ be I with the $m^{\text {th}}$ element excluded. If $\bar{g}(y)_{i,j} = \bar{g}(y)_{j,i}$ for all i, j, then also

$$\begin{aligned} \bar{g}(y)_{i,I} = \bar{g}(y)_{i_m,\left\{ i,I_m\right\} } \quad \text { for all } i,I,m. \end{aligned}$$

(27)

Inserting (27) in (21) we get $g^{(\kappa )}(x) = (1+\kappa ) \bar{g}^{(\kappa )}(x)$. Then inserting this for $\bar{g}^{(\kappa )}(x)$ in (14), we get (26), which uniquely defines the AVF discrete gradient.

A consequence of the above result is that the AVF discrete gradient is the unique discrete gradient for which the scheme (9) with $\overline{S}(x,\hat{x},h)=S$ is a B-series method when applied to the system (6).

As we see from the above definitions and discussion, each of the discrete gradients have their advantages and disadvantages. Gonzalez’ midpoint discrete gradient is easily calculated from the energy H and the gradient $\nabla H$, but it is in general only once differentiable. The Itoh–Abe and Furihata discrete gradient methods do not require knowledge of the gradient, but the former is only a first order method and the latter is only defined for H of the form (24). The AVF discrete gradient is the unique discrete gradient whose series expansion is given by the differentials of the gradient. It does however require an integral to be calculated. The discrete gradient of Matsubara and co-authors has a different area of use than the others: it is only defined when H is a neural network. In that case neither Gonzelez’ midpoint, the AVF or the Furihata discrete gradients can be obtained.

In the next section we consider the specific case where the AVF discrete gradient is chosen and S is constant, so that we can build on established results for B-series methods to develop new theory.

3 A generalization of the AVF method

Let us recall the concept of B-series. Referring to the definitions in [21, Section III.1], we let T be the set of rooted trees, built recursively from starting with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq88_HTML.gif

and letting $\tau = [\tau _1,\ldots ,\tau _m]$ be the tree obtained by grafting the roots of the trees $\tau _1,\ldots ,\tau _m$ to a new root. Furthermore, $F(\tau )$ is the elementary differential associated with the tree $\tau $, defined by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq93_HTML.gif

and

$$\begin{aligned} F(\tau )(x) = f^{(m)}(x) \big (F(\tau _1)(x),\ldots ,F(\tau _m)(x)\big ), \end{aligned}$$

and $\sigma (\tau )$ is the symmetry coefficient for $\tau $, defined by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq96_HTML.gif

and

$$\begin{aligned} \sigma (\tau ) = \sigma (\tau _1) \cdots \sigma (\tau _m) \cdot \mu _1! \, \mu _2! \cdots , \end{aligned}$$

(28)

where the integers $\mu _1, \mu _2, \ldots $ count equal trees among $\tau _1,\ldots ,\tau _m$. Then, if $\phi : T \cup \lbrace \emptyset \rbrace \rightarrow \mathbb {R}$ is an arbitrary map, a B-series is a formal series

$$\begin{aligned} B(\phi ,x) = \phi (\emptyset )x + \sum _{\tau \in T}\frac{h^{|\tau |}}{\sigma (\tau )}\phi (\tau )F(\tau )(x). \end{aligned}$$

(29)

The exact solution of (1) can be written as the B-series $B(\frac{1}{\gamma },x)$, where the coefficient $\gamma $ satisfies

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq102_HTML.gif

and

$$\begin{aligned} \gamma (\tau ) = |\tau |\, \gamma (\tau _1) \cdots \gamma (\tau _m) \quad \text { for } \tau = [\tau _1,\ldots ,\tau _m], \end{aligned}$$

(30)

where $|\tau |$ is the order, i.e. the number of nodes, of $\tau $.

Definition 3

The generalized AVF method is given by

$$\begin{aligned} \frac{\hat{x}-x}{h}= & {} \, \Bigg (I + \sum _{n=2}^{p-1} h^n \sum _{j} b_{nj} \bigg ( \prod _{k=1}^n f'(z_{njk}) + (-1)^n \prod _{k=1}^n f'(z_{nj(n-k+1)}) \bigg ) \Bigg )\nonumber \\&\cdot \int _0^1 f((1-\xi )x + \xi \hat{x}) \, \mathrm {d}\xi , \end{aligned}$$

(31)

where each $z_{njk} :=z_{njk}(x,\hat{x},h) = B(\phi _{njk},x)$ can be written as a B-series with $\phi (\emptyset ) = 1$.

Theorem 1

When applied to (1) with $f(x) = S\nabla H(x)$, where S is a constant skew-symmetric matrix, the scheme (31) preserves H, in that $H(\hat{x}) = H(x)$.

Proof

With $f(x) = S\nabla H(x)$, (31) becomes

$$\begin{aligned} \frac{\hat{x}-x}{h} = \overline{S}(x,\hat{x},h) \overline{\nabla }_\text {AVF} H(x,\hat{x}), \end{aligned}$$

with

$$\begin{aligned} \overline{S}(x,\hat{x},h) = S + \sum _{n=2}^{p-1} h^n \sum _{j} b_{nj} \left( \prod _{k=1}^n S D^2H(z_{njk}) + (-1)^n \prod _{k=1}^n S D^2H(z_{nj(n-k+1)}) \right) S. \end{aligned}$$

We have

$$\begin{aligned} \left( \prod _{k=1}^n \right.&\left. S D^2H(z_{njk}) \cdot S + (-1)^n \prod _{k=1}^n S D^2H(z_{nj(n-k+1)} ) \cdot S \right) ^T \\&= S^T \prod _{k=1}^n \big (D^2H(z_{nj(n-k+1)})^T S^T\big ) + (-1)^n S^T \prod _{k=1}^n \big (D^2H(z_{njk})^T S^T\big ) \\&= (-1)^{i+1} S \prod _{k=1}^n D^2H(z_{nj(n-k+1)}) S - S \prod _{k=1}^n D^2H(z_{njk}) S \\&= - \left( \prod _{k=1}^n S D^2H(z_{njk}) \cdot S + (-1)^n \prod _{k=1}^n S D^2H(z_{nj(n-k+1)} ) \cdot S \right) , \end{aligned}$$

and thus $\overline{S}(x,\hat{x},h)$ is a skew-symmetric matrix.

Before considering the order conditions of the generalized AVF method, let us recall a couple of results from the literature on B-series.

Lemma 4

([21, Lemma III.1.9]) Let B(a, x) be a B-series with $a(\emptyset ) = 1$. Then $h f(B(a,x)) = B(a',x)$ is also a B-series, with $a'(\emptyset ) = 0$,

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq114_HTML.gif

and otherwise

$$\begin{aligned} a'(\tau ) = a(\tau _1) \cdots a(\tau _m) \quad \text { for } \tau = [\tau _1,\ldots ,\tau _m]. \end{aligned}$$

Lemma 5

([34, Theorem 2.2]) Let B(a, x) and B(b, x) be two B-series with $a(\emptyset ) = 1$ and $b(\emptyset ) = 0$. Then $h f'(B(a,x))B(b,x) = B(a\times b,x)$, i.e. a B-series, with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq118_HTML.gif

and otherwise

$$\begin{aligned} (a \times b)(\tau )&= \sum _{i=1}^m \prod _{j=1, j \ne i}^m a(\tau _j)b(\tau _i) \quad \text { for } \tau = [\tau _1,\ldots ,\tau _m]. \end{aligned}$$

Proposition 1 in [6] states that the standard AVF method is a B-series method. We build on the proof of that proposition to prove the following result.

Proposition 2

The generalized AVF method (31) is a B-series method.

Proof

First we define $\hat{e} : T \cup \lbrace \emptyset \rbrace \rightarrow \mathbb {R}$ by $\hat{e}(\emptyset ) = 1$ and $\hat{e}(\tau ) = 0$ for all $\tau \ne \emptyset $. Then, assuming that the solution $\hat{x}$ of (31) can be written as the B-series $\hat{x}= B(\Phi ,x)$, we find the B-series

$$\begin{aligned} h \int _0^1 f\big ((1-\xi )x + \xi \hat{x}\big ) \, \mathrm {d}\xi&= h \int _0^1 f\big (B((1-\xi )\hat{e}+\xi \Phi ,x)\big ) \, \mathrm {d}\xi \\&= \int _0^1 B\big (((1-\xi )\hat{e}+\xi \Phi )',x\big ) \, \mathrm {d}\xi \\&= B\big (\int _0^1 ((1-\xi )\hat{e}+\xi \Phi )' \, \mathrm {d}\xi ,x\big ). \end{aligned}$$

Setting $\theta :=\int _0^1 ((1-\xi )\hat{e}+\xi \Phi )' \, \mathrm {d}\xi = \int _0^1((1-\xi )\hat{e})'\, \mathrm {d}\xi + \int _0^1 (\xi \Phi )' \, \mathrm {d}\xi = \int _0^1 (\xi \Phi )' \, \mathrm {d}\xi $, we get

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ32_HTML.png

(32)

Then we may rewrite (31) as

$$\begin{aligned} \hat{x}=&\, x + \left( I + \sum _{n=2}^{p-1} h^n \sum _j b_{nj} \left( \prod _{k=1}^n f'(B(\phi _{njk},x)) + (-1)^n \prod _{k=1}^n f'(B(\phi _{nj(n-k+1)},x)) \right) \right) B(\theta ,x)\\ =&\, x + B(\theta ,x)\\&+ \sum _{n=2}^{p-1} \sum _j b_{nj} \left( B(\phi _{nj1} \times \cdots \times \phi _{njn} \times \theta ,x) + (-1)^n B(\phi _{njn} \times \cdots \times \phi _{nj1} \times \theta ,x) \right) \\ =&\, B(\Phi ,x), \end{aligned}$$

with

$$\begin{aligned} \Phi = \hat{e} + \theta +\sum _{n=2}^{p-1} \sum _j b_{nj} \left( \phi _{nj1} \times \cdots \times \phi _{njn} \times \theta + (-1)^n \, \phi _{njn} \times \cdots \times \phi _{nj1} \times \theta \right) . \end{aligned}$$

(33)

Comparing the B-series of the exact solution and the B-series of the solution of (31), and noting that the elementary differentials are independent, we immediately get the following result.

Theorem 2

The generalized AVF method (31) is of order p if and only if

$$\begin{aligned} \Phi (\tau ) = \frac{1}{\gamma (\tau )} \quad \text { for } |\tau |\le p, \end{aligned}$$

(34)

where $\Phi $ is given by (33) and $\gamma $ is given by (30).

Table 1

Elementary differentials and their coefficients in the B-series of the solution of (31), up to fourth order

$\|\tau \|$	$F(\tau )^i$	$\sigma (\tau )$	$\gamma (\tau )$	$\Phi (\tau )$
1	$f^i$	1	1	1
2	$f^i_jf^j$	1	2	$\frac{1}{2}$
3	$f^i_{jk}f^jf^k$	2	3	$\frac{1}{3}$
	$f^i_jf^j_kf^k$	1	6	$\frac{1}{4}+2\sum _j b_{2j}$
4	$f^i_{jkl}f^jf^kf^l$	6	4	$\frac{1}{4}$
	$f^i_{jk}f^jf^k_lf^l$	1	8
	$f^i_{j}f^j_{kl}f^kf^l$	2	12
	$f^i_{j}f^j_{k}f^k_lf^l$	1	24	$\frac{1}{8}+2\sum _j b_{2j}$

The terms $\Phi (\tau )$ can be found from (33) by applying Lemma 5 recursively, as illustrated by the following example.

Example 1

Consider

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq158_HTML.gif

, and assume we have found $\Phi $ for all trees up to and including order four already, as given in Table 1. We have

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ162_HTML.png

Then we calculate

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ163_HTML.png

where we have used in the second equality that

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq160_HTML.gif

. Similarly we find

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq161_HTML.gif

. Furthermore,

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ164_HTML.png

and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq162_HTML.gif

. Hence,

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ165_HTML.png

Now, if we assume the order condition (34) to be satisfied for all trees up to and including order four, we can replace $\sum _j b_{2j} = -\frac{1}{24}$ and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq164_HTML.gif

in the above expression, use that

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq165_HTML.gif

, and get that (34) is satisfied for

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq166_HTML.gif

if and only if

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ35_HTML.png

(35)

3.1 Construction of higher order schemes

As the size of the trees grows, finding $\Phi (\tau )$ from (33) can become quite a cumbersome operation. Furthermore, we observe from Table 1 that there are some equivalent order conditions for different trees. Before presenting more convenient techniques for finding order conditions for the generalized AVF method, let us define some more concepts related to B-series and trees.

First, recall that the Butcher product of two trees $u = [u_1,\ldots ,u_m]$ and $v = [v_1,\ldots ,v_n]$ is given by $u \circ v = [u_1,u_2,\ldots ,u_m,v]$. This operation is neither associative nor commutative, and in contrast to the practice in [21], we here take the product of several factors without parentheses to mean evaluation from right to left:

$$\begin{aligned} u_1 \circ u_2 \circ \cdots \circ u_k :=u \circ ( u_2 \circ ( \cdots \circ u_k)). \end{aligned}$$

Given a forest $\mu = (\tau _1,\ldots ,\tau _m)$, the tree obtained by grafting the roots of every tree in $\mu $ to a new root is denoted by $[\mu ] = [\tau _1,\ldots ,\tau _m]$. Moreover, $\mu ^{-1}(\tau )$ denotes the forest such that $[\mu ^{-1}(\tau )] = \tau $. We extend the maps $\phi : T \cup \lbrace \emptyset \rbrace \rightarrow \mathbb {R}$ and $\gamma : T \cup \lbrace \emptyset \rbrace \rightarrow \mathbb {R}$ to forests by the letting $\phi (\mu ) = \prod _{i=1}^m \phi (\tau _i)$ and $\gamma (\mu ) = \prod _{i=1}^m \gamma (\tau _i)$ for $\mu = (\tau _1,\ldots ,\tau _m)$.

Consider now a tree $\tau $ consisting of $|\tau |$ nodes. We may number every tree from 1 to $|\tau |$, starting at the root and going from left to right on the increasing levels above. For a given node $i \in \lbrace 1,\ldots ,|\tau |\rbrace $ on level $n+1$, there exists a unique set of forests $\hat{\tau }^i = \lbrace \mu ^i_1,\ldots ,\mu ^i_{n+1}\rbrace $ such that

$$\begin{aligned} \tau =[\mu ^i_1] \circ [\mu ^i_2] \circ \cdots \circ [\mu ^i_{n+1}]. \end{aligned}$$

That is, labeling node i,

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ166_HTML.png

Proposition 3

The $\Phi $ of (34) can alternatively be found by

$$\begin{aligned} \Phi (\tau ) = \hat{e}(\tau ) + \theta (\tau ) +\sum _{i \text { s.t. } n\ge 2}\Lambda (\hat{\tau }^i) \end{aligned}$$

(36)

where $\hat{e}(\emptyset ) = 1$ and $\hat{e}(\tau ) = 0$ for all $\tau \ne \emptyset , \theta (\emptyset ) = 0, $,

$$\begin{aligned} \theta ([\tau _1,\ldots ,\tau _m]) = \frac{1}{m+1}\Phi (\tau _1)\cdots \Phi (\tau _m), \end{aligned}$$

and

$$\begin{aligned} \Lambda (\hat{\tau }^i) = \theta ([\mu _{n+1}^i]) \sum _j b_{nj} \left( \phi _{nj1}(\mu _1^i) \cdots \phi _{njn}(\mu _n^i) + (-1)^n \phi _{njn}(\mu _1^i) \cdots \phi _{nj1}(\mu _n^i) \right) . \end{aligned}$$

(37)

Proof

Define $n_i$ so that $n_i+1$ is the level of node i. Collect the children of node i in the set $C_i$. We have

$$\begin{aligned}{}[\mu _{n_i+1}^i] = [\mu _{n_k}^k] \circ [\mu _{n_k+1}^k] \quad \text { for all } k\in C_i, \end{aligned}$$

and thus

$$\begin{aligned} (a\times b)([\mu _{n_i+1}^i]) = \sum _{k\in C_i} a(\mu _{n_k}^k) b([\mu _{n_k+1}^k]). \end{aligned}$$

Note also that $\mu _{n_i}^i = \mu _{n_i}^k = \mu _{n_k-1}^k$ if $k\in C_i$. Then we get

$$\begin{aligned} (\phi _{nj1} \times \cdots \times \phi _{njn} \times \theta )(\tau )&= (\phi _{nj1} \times \cdots \times \phi _{njn} \times \theta )([\mu _1^{1}])\\&=\sum _{i_1\in C_1} \phi _{nj1}(\mu _1^{i_1})(\phi _{nj2} \times \cdots \times \phi _{njn} \times \theta )([\mu _2^{i_1}])\\&=\sum _{i_1\in C_1} \phi _{nj1}(\mu _1^{i_1}) \sum _{i_2\in C_{i_1}} \phi _{nj2}(\mu _2^{i_2}) (\phi _{nj3} \times \cdots \times \phi _{njn} \times \theta )([\mu _3^{i_2}])\\&=\sum _{i_1\in C_1} \sum _{i_2\in C_{i_1}} \phi _{nj1}(\mu _1^{i_2})\phi _{nj2}(\mu _2^{i_2}) (\phi _{nj3} \times \cdots \times \phi _{njn} \times \theta )([\mu _3^{i_2}])\\&\quad \vdots \\&=\sum _{i_1\in C_1} \sum _{i_2\in C_{i_1}} \cdots \sum _{i_n\in C_{i_{n-1}}} \phi _{nj1}(\mu _1^{i_n}) \cdots \phi _{njn}(\mu _n^{i_n})\theta ([\mu _{n+1}^{i_n}])\\&=\sum _{i \text{ on } \text{ level } n+1} \phi _{nj1}(\mu _1^{i}) \cdots \phi _{njn}(\mu _n^{i})\theta ([\mu _{n+1}^{i}]). \end{aligned}$$

Inserting this and the corresponding result for $(\phi _{njn} \times \cdots \times \phi _{nj1} \times \theta )(\tau )$ in (33), we get (37).

In [7, 15], conditions are derived for a B-series method to be energy preserving when applied to the system (6). In [37], while giving the AVF method as one such method, Quispel and McLaren present a general form of what they call energy-preserving linear combinations of rooted trees:

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ167_HTML.png

Here we give their result as a lemma, which is proved later by the proof of the more general Theorem 5.

Lemma 6

Let $\mu _1, \ldots , \mu _n$ be n arbitrary forests. Then, if $f(x)=S\nabla H(x)$ for some skew-symmetric constant matrix S, we have that $F(\omega )(x) \cdot \nabla H(x) = 0$ for

$$\begin{aligned} \omega = [\mu _1] \circ [\mu _2] \circ \cdots [\mu _n] \circ [\emptyset ] + (-1)^n \, [\mu _n] \circ [\mu _{n-1}] \circ \cdots [\mu _1] \circ [\emptyset ]. \end{aligned}$$

(38)

There is a connection between (36) and Lemma 6 such that instead of order conditions for every tree, we can calculate order conditions for every energy-preserving linear combination. To see this we start by collecting the leaf nodes, i.e. nodes with no children, of the tree $\tau $ in a set $I_l$ and the other nodes in the set $I_n$. If node $i \in I_n$, we may then use the relation

$$\begin{aligned} \Lambda (\lbrace \mu ^i_1,\ldots ,\mu ^i_{n},\mu ^i_{n+1}\rbrace ) = \theta ([\mu _{n+1}^i]) \Lambda (\lbrace \mu ^i_1,\ldots ,\mu ^i_{n},\emptyset \rbrace ) \end{aligned}$$

to find $\Lambda (\hat{\tau }^i)$ from the previously calculated $\Lambda $ for a smaller tree. Then if lower order conditions are satisfied, we have numerical values for these $\Lambda $. The leaf nodes on the other hand, with their corresponding $\hat{\tau }^i = \lbrace \mu _1^i,\ldots ,\mu _n^i,\emptyset \rbrace $, gives an energy-preserving linear combination (38) which $\tau $ belongs to. If i is on level two, this combination is simply $\tau -\tau =0$, and accordingly $\Lambda $ is not calculated for these nodes in (36). Moreover, leaves on the same level have identical $\hat{\tau }^i$. Thus, a tree with leaves on m different levels above level two will belong to at most m non-zero energy-preserving linear combinations (38). For each of these combinations there is a corresponding order condition, with the left hand side given by (37). The right hand side can be found by considering the individual trees.

If we assume the conditions for order $<p$ to be satisfied, we may replace (34) by

$$\begin{aligned} \sum _{i\in I_l}\Lambda (\hat{\tau }^i) = \frac{1}{\gamma (\tau )} - \hat{e}(\tau ) - \sum _{i\in I_n}\frac{\Lambda (\lbrace \mu ^i_1,\ldots ,\mu ^i_{n},\emptyset \rbrace )}{(|\mu _{n+1}^i|+1) \gamma (\mu _{n+1}^i)}, \end{aligned}$$

(39)

where $|\mu |$ denotes the number of trees in the forest $\mu $. Note that $\Lambda (\lbrace \emptyset \rbrace ) = 1$ and hence $\Lambda (\hat{\tau }^1) = \theta (\tau )$. Then we can calculate the numerical value for the right hand side and, if $\tau $ has leaves on only one level $>2$, find an order condition for both $\tau $ and the other tree in the combination (38). This warrants an example.

Example 2

Consider again the tree

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq220_HTML.gif

, which is part of the energy-preserving linear combination

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq221_HTML.gif

Ignoring the two nodes on level 2, there are three nodes to calculate $\Lambda $ for: $i=1, i=4$ and $i=5$. We find

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ168_HTML.png

The right hand side of (39) becomes

$$\begin{aligned} \frac{1}{\gamma (\tau )} - \frac{1}{18} - (-\frac{1}{48}) = \frac{1}{30} - \frac{1}{18} + \frac{1}{48} =-\frac{1}{720}, \end{aligned}$$

and we have the order condition (35) for the linear combination

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq225_HTML.gif

If there are leaves on $r>1$ different levels above level two, things get slightly more complicated. Then we get r different terms on the left hand side of (39) and we need to consider the order condition for $\tau $ and the r trees it forms energy-preserving linear combinations with, so that we get an equation for every energy-preserving combination of these trees, also those not including $\tau $. This is illustrated by the following example.

Example 3

The tree

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq229_HTML.gif

forms energy-preserving combinations with both

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq230_HTML.gif

and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq231_HTML.gif

. Thus we have to calculate (39) for all three trees to find order conditions for the corresponding linear combinations. Starting with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq232_HTML.gif

, which has three nodes above level two, two leaves and one non-leaf, we get

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ169_HTML.png

For the right hand side of (39), we get

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ170_HTML.png

and hence the order condition for

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq233_HTML.gif

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ40_HTML.png

(40)

Similarly we calculate (39) for

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq234_HTML.gif

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ41_HTML.png

(41)

and for

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq235_HTML.gif

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ42_HTML.png

(42)

Combining (40), (41) and (42), we get the equivalent system of equations

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ43_HTML.png

(43)

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ44_HTML.png

(44)

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ45_HTML.png

(45)

where the choice of $\alpha \in \mathbb {R}$ is arbitrary. The order conditions (43)–(45) can be associated to the linear combinations $, $ and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq238_HTML.gif

, respectively.

Table 2

Energy-preserving linear combinations of elementary differentials, and their associated order conditions for the scheme (31), up to sixth order. The coefficients $\alpha _1, \alpha _2 \in \mathbb {R}$ are arbitrary

$\|\tau \|$	$\omega $	Order condition
1		–
2	–	–
3		$\sum _j b_{2j} = -\frac{1}{24}$
4
5



		$\sum _j b_{4j}=\frac{1}{240}$
6

By considering the order conditions in Table 2, we find a fifth order scheme of the form (31) given by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ46_HTML.png

(46)

where

$$\begin{aligned} z_1 = x + \frac{2}{5} h f(x), \qquad z_2 = x + \frac{17+\sqrt{17}}{30} h f(z_1), \qquad z_3 = x + \frac{17-\sqrt{17}}{30} h f(z_1). \end{aligned}$$

A symmetric sixth order scheme is given by

$$\begin{aligned} \frac{\hat{x}-x}{h}= & {} \, \bigg (I - \frac{13}{360} h^2 f'\big (\bar{x}+\frac{\sqrt{13}}{26} h f(\bar{x}-\frac{3\sqrt{13}}{26} h f(\bar{x}))\big ) f'\big (\bar{x}-\frac{\sqrt{13}}{26} h f(\bar{x}+\frac{3\sqrt{13}}{26} h f(\bar{x}))\big ) \nonumber \\&- \,\frac{13}{360} h^2 f'\big (\bar{x}-\frac{\sqrt{13}}{26} h f(\bar{x}+\frac{3\sqrt{13}}{26} h f(\bar{x}))\big ) f'\big (\bar{x}+\frac{\sqrt{13}}{26} h f(\bar{x}-\frac{3\sqrt{13}}{26} h f(\bar{x}))\big ) \nonumber \\&- \,\frac{1}{180} h^2 f'(x) f'(x) - \frac{1}{180} h^2 f'(\hat{x}) f'(\hat{x}) \nonumber \\&+\, \frac{1}{720}h^3 f'(\bar{x}-\frac{1}{2}hf(\bar{x})) f'(\bar{x}) f'(\bar{x}+\frac{1}{2}hf(\bar{x})) \nonumber \\&- \, \frac{1}{720}h^3 f'(\bar{x}+\frac{1}{2}hf(\bar{x})) f'(\bar{x}) f'(\bar{x}-\frac{1}{2}hf(\bar{x})) \nonumber \\&+\, \frac{1}{120} h^4 f'(\bar{x}) f'(\bar{x}) f'(\bar{x}) f'(\bar{x}) \bigg ) \int _0^1 f((1-\xi )x + \xi \hat{x}) \, \mathrm {d}\xi , \end{aligned}$$

(47)

where $\bar{x}= \frac{x+\hat{x}}{2}$. If we wish to calculate the matrix in front of the integral explicitly, we have a non-symmetric sixth order scheme given by

$$\begin{aligned} \frac{\hat{x}-x}{h}= & {} \, \bigg (I - \frac{13}{360} h^2 \big (f'(z_6)f'(z_7) + f'(z_7)f'(z_6)\big )\nonumber \\&- \frac{1}{180} h^2 \big (f'(x) f'(x) + f'(z_1) f'(z_1)\big ) \nonumber \\&+ \frac{1}{720}h^3 \big ( f'(x) f'(z_2) f'(z_3) - f'(z_3) f'(z_2) f'(x)\big )\nonumber \\&+ \frac{1}{120} h^4 f'(z_2) f'(z_2) f'(z_2) f'(z_2) \bigg ) \int _0^1 f((1-\xi )x + \xi \hat{x}) \, \mathrm {d}\xi , \end{aligned}$$

(48)

with

$$\begin{aligned} \begin{aligned} z_1&= x + \frac{1}{4} h f(x) + \frac{3}{4} h f\big (x+\frac{2}{3} h f(x + \frac{1}{3}h f(x))\big ), \quad z_2 = x + \frac{1}{2} h f(x),\\ z_3&= x + h f(z_2), \quad z_4 = \frac{1}{2}(x+z_3) - \frac{3\sqrt{13}}{26} h f(z_2),\\ z_5&= \frac{1}{2}(x+z_3) + \frac{3\sqrt{13}}{26} h f(z_2), \\ z_6&= \frac{1}{2}(x+z_1) + \frac{\sqrt{13}}{26} h f(z_4), \quad z_7 = \frac{1}{2}(x+z_1) - \frac{\sqrt{13}}{26} h f(z_5). \end{aligned} \end{aligned}$$

4 AVF discrete gradient methods for general skew-gradient systems

We will now build on the results of the previous section by generalizing the results to the situation where S(x) in the skew-gradient system (4) is not necessarily constant. Consider therefore now an ODE of the form (4), and set again $g :=\nabla H$. By Taylor expansion of x around $t=t_0$ we get

$$\begin{aligned} x(t_0+h)&= \, x + h S g + \frac{h^2}{2}(S' g S g + S g' S g) + \frac{h^3}{6}(S'' g (S g, S g) + 2 S' g' (S g, S g) \\&+ S g'' (S g, S g) + S' g S' g S g + S' g S g' S g\\&+ S g' S' g S g + S g' S g' S g) + \mathcal {O}(h^4), \end{aligned}$$

where $x:=x(t_0)$, and S, g and their derivatives are evaluated in x. Introducing the notation $f^\circ :=S' g$ and $f^\bullet :=S g'$, we can write this in the abbreviated form

$$\begin{aligned} x(t_0+h)= & {} \, x + h f + \frac{h^2}{2}(f^\circ f + f^\bullet f) + \frac{h^3}{6}(f^{\circ \circ } (f, f) + 2 f^{\circ \bullet } (f, f) + f^{\bullet \bullet } (f, f) \nonumber \\&+ f^{\circ } f^{\circ } f + f^{\circ } f^{\bullet } f + f^{\bullet } f^{\circ } f + f^{\bullet } f^{\bullet } f) + \mathcal {O}(h^4). \end{aligned}$$

(49)

4.1 Skew-gradient systems and P-series

A P-series is given by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ50_HTML.png

(50)

where TP is the set of rooted bi-colored trees and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq285_HTML.gif

and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq286_HTML.gif

are the subsets of TP whose roots are black and white, respectively [21, Section III.2]. The bi-colored trees are built recursively; starting with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq287_HTML.gif

and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq288_HTML.gif

, we let

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq289_HTML.gif

be the tree you get by grafting the roots of $\tau _1,\ldots ,\tau _m$ to a black root and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq291_HTML.gif

the tree you get by grafting $\tau _1,\ldots ,\tau _m$ to a white root. No subscript, i.e. $\tau = [\tau _1,\ldots ,\tau _m]$, means grafting to a black root.

The exact solution of a partitioned system

$$\begin{aligned} \begin{aligned} \dot{x}&= f(x,y), \quad x(t_0) = x_0,\\ \dot{y}&= g(x,y), \quad y(t_0) = y_0, \end{aligned} \end{aligned}$$

(51)

can be written as $(x(t_0+h),y(t_0+h)) = P(1/\gamma ,(x_0,y_0))$, where the coefficient $\gamma $ is given by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq296_HTML.gif

and (30). As noted in [10], setting $f(x,y) :=S(y) \nabla H(x)$, the skew-gradient system (4) can be written as (51) with $g=f$. When $g=f$, all coefficients and the elementary differentials $F(\tau )$ in (50) are given independently of the color of the root. Thus for the system (4), it suffices to consider

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ52_HTML.png

(52)

and we have that the exact solution of (4) can be written as $x(t_0+h) = P(1/\gamma ,x_0)$. Breaking slightly with convention, we define a P-series to be the single row version (52) in the remainder of this paper. Denoting black-rooted subtrees by $\tau _i$ and white-rooted subtrees by $\bar{\tau }_i$, the elementary differentials $F(\tau )$ for the skew-gradient system are given recursively by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq305_HTML.gif

, and

$$\begin{aligned} F(\tau )(x) = S^{(l)}D^m\nabla H(F(\tau _1)(x),\ldots ,F(\tau _m)(x),F(\bar{\tau }_1)(x),\ldots ,F(\bar{\tau }_l)(x)) \end{aligned}$$

(53)

for both

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq306_HTML.gif

and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq307_HTML.gif

. The bi-colored trees in

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq308_HTML.gif

and their corresponding elementary differentials F are given up to order three in Table 3. The number of trees grows very quickly with the order; see https://oeis.org/A000151.

Table 3

Bi-colored trees and their elementary differentials up to third order

$\|\tau \|$	$F(\tau )^i$	$F(\tau )$	$\alpha (\tau )$	$\gamma (\tau )$	$\sigma (\tau )$
1	$S^i_j g^j$	f	1	1	1
2	$S^i_{jk} g^j S^{k}_l g^l$	$f^\circ f$	1	2	1
	$S^i_j g^j_{k} S^{k}_l g^l$	$f^\bullet f$	1	2	1
3	$S^{i}_{j k m} g^{j} S^{k}_l g^l S^{m}_n g^n$	$f^{\circ \circ } (f,f)$	1	3	2
	$S^{i}_{jk} g^j_{m} S^{k}_l g^l S^{m}_n g^n$	$f^{\circ \bullet } (f,f)$	2	3	1
	$S^{i}_j g^j_{k m} S^{k}_l g^l S^{m}_n g^n$	$f^{\bullet \bullet } (f,f)$	1	3	2
	$S^{i}_{jk} g^{j} S^{k}_{lm} g^l S^{m}_n g^n$	$f^{\circ } f^{\circ } f$	1	6	1
	$S^{i}_j g^j_{k} S^{k}_{lm} g^{l} S^{m}_n g^n$	$f^{\bullet } f^{\circ } f$	1	6	1
	$S^{i}_{jk} g^{j} S^{k}_l g^{l}_m S^{m}_n g^n$	$f^{\circ } f^{\bullet } f$	1	6	1
	$S^{i}_j g^j_{k} S^{k}_l g^l_{m} S^{m}_n g^n$	$f^{\bullet } f^{\bullet } f$	1	6	1

The following lemma is Lemma III.2.2 in [21] amended to fit our setting.

Lemma 7

Let P(a, x) and P(b, x) be two P-series with $a(\emptyset ) = b(\emptyset ) = 1$. Then

$$\begin{aligned} h S(P(a,x)) \nabla H(P(b,x)) = P(a \vee b,x), \end{aligned}$$

where $(a \vee b)(\emptyset ) = 0, $, and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ171_HTML.png

Proposition 4

The AVF discrete gradient scheme

$$\begin{aligned} \frac{\hat{x}-x}{h} = S\bigg (\frac{x+\hat{x}}{2}\bigg ) \int _0^1 \nabla H ((1-\xi )x + \xi \hat{x}) \, \mathrm {d}\xi \end{aligned}$$

(54)

is a second order P-series method.

Proof

As in the proof of Proposition 2, we define $\hat{e}$ by $\hat{e}(\emptyset ) = 1$ and $\hat{e}(\tau ) = 0$ for all $\tau \ne \emptyset $. Now, assume that the solution $\hat{x}$ of (54) can be written as the P-series $\hat{x}= P(\Phi ,x)$. Then, using Lemma 7, we find the P-series

$$\begin{aligned}&h \, S\bigg (\frac{x+\hat{x}}{2}\bigg ) \int _0^1 \nabla H ((1-\xi )x + \xi \hat{x}) \, \mathrm {d}\xi \\&\quad = h \, S\big (P\big (\frac{1}{2}\hat{e}+\frac{1}{2}\Phi ,x\big )\big ) \, \int _0^1 \nabla H \big ( P((1-\xi )\hat{e} + \xi \Phi , \, x)\big ) \, \mathrm {d}\xi \\&\quad = \int _0^1 h \, S\big (P\big (\frac{1}{2}\hat{e}+\frac{1}{2}\Phi , \, x\big )\big ) \,\nabla H \big ( P((1-\xi )\hat{e} + \xi \Phi , \, x)\big ) \, \mathrm {d}\xi \\&\quad = P \bigg ( \int _0^1 \big (\big ( \frac{1}{2}\hat{e}+\frac{1}{2}\Phi \big ) \vee \big ((1-\xi )\hat{e}+\xi \Phi \big ) \big ) \, \mathrm {d}\xi , \, x \bigg ). \end{aligned}$$

Thus we get $\Phi = \hat{e} + \int _0^1 \big (\big ( \frac{1}{2}\hat{e}+\frac{1}{2}\Phi \big ) \vee \big ((1-\xi )\hat{e}+\xi \Phi \big ) \big ) \, \mathrm {d}\xi = \hat{e} + \int _0^1 \big (\big ( \frac{1}{2} \Phi \big ) \vee \big (\xi \Phi \big ) \big ) \, \mathrm {d}\xi $. That is, $\Phi (\emptyset ) = 1, $, and

$$\begin{aligned} \Phi ([\tau _1,\ldots ,\tau _m,\bar{\tau }_1,\ldots ,\bar{\tau }_l]) = \frac{1}{(m+1) 2^l}\Phi (\tau _1) \cdots \Phi (\tau _m)\Phi (\bar{\tau }_1) \cdots \Phi (\bar{\tau }_l). \end{aligned}$$

Writing out the first few terms of the series, we have

$$\begin{aligned} \begin{aligned} \hat{x}&= \, x + h f + \frac{h^2}{2}(f^\circ f + f^\bullet f) + h^3(\frac{1}{8}f^{\circ \circ } (f, f) + \frac{1}{4} f^{\circ \bullet } (f, f) + \frac{1}{6}f^{\bullet \bullet } (f, f) \\&\quad + \frac{1}{4}f^{\circ } f^{\circ } f + \frac{1}{4} f^{\circ } f^{\bullet } f + \frac{1}{4} f^{\bullet } f^{\circ } f + \frac{1}{4} f^{\bullet } f^{\bullet } f) + \mathcal {O}(h^4), \end{aligned} \end{aligned}$$

which, after comparing with the expanded exact solution (49), we see is of order two.

The following lemma is obtained in a manner similar to Lemma 5, i.e. Theorem 2.2 in [34], and hence we present it without its proof.

Lemma 8

Let P(a, x), P(b, x) and P(c, x) be three P-series with $a(\emptyset ) = b(\emptyset ) = 1$ and $c(\emptyset ) = 0$. Then

$$\begin{aligned} h \, S(P(a,x)) D^2 H(P(b,x)) P(c,x) = P((a,b) \times c,x) \end{aligned}$$

with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq357_HTML.gif

and otherwise

$$\begin{aligned} ((a,b) \times c)(\tau )&= \sum _{i=1}^m \prod _{j=1, j \ne i}^m \prod _{k=1}^l a(\bar{\tau }_k) b(\tau _j) c(\tau _i) \quad \text { for } \tau = [\tau _1,\ldots ,\tau _m,\bar{\tau }_1,\ldots ,\bar{\tau }_l]. \end{aligned}$$

(55)

Note that $\left\{ \emptyset \right\} $ counts as both a black-rooted and a white-rooted tree. Hence we have e.g.

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ172_HTML.png

where we also use that

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq359_HTML.gif

We now present a subclass of the AVF discrete gradient method, for which we will find order conditions using Lemma 7 and Lemma 8. This subclass is every AVF discrete gradient method for which the approximation of S(x) can be written in the form

$$\begin{aligned} \begin{aligned} \overline{S}(x,\hat{x},h) =&\sum _{n=0}^{p-1} h^n \sum _j b_{nj} \Bigg (\prod _{k=1}^n S(\bar{z}_{njk}) D^2 H(z_{njk}) \cdot S(\bar{z}_{nj(n+1)}) \\&+ (-1)^n \, S(\bar{z}_{nj(n+1)}) \prod _{k=1}^n D^2 H(z_{nj(n-k+1)}) S(\bar{z}_{nj(n-k+1)})\Bigg ), \end{aligned} \end{aligned}$$

(56)

where, if $\hat{x}$ is the solution of

$$\begin{aligned} \frac{\hat{x}-x}{h} = \overline{S}(x,\hat{x},h)\overline{\nabla }_{\text {AVF}} H(x,\hat{x}), \end{aligned}$$

each $z_{njk} :=z_{njk}(x,\hat{x},h) = P(\phi _{njk},x)$ and each $\bar{z}_{njk} :=\bar{z}_{njk}(x,\hat{x},h) = P(\psi _{njk},x)$ can be written as a P-series with $\phi _{njk}(\emptyset ) = \psi _{njk}(\emptyset ) = 1$ for all n, j, k. We require that $\sum _j b_{0j} = \frac{1}{2}$, which ensures that (56) is a consistent approximation of S(x).

Theorem 3

The discrete gradient scheme (9) with the AVF discrete gradient (10) and the approximation of S(x) given by (56) is a P-series method.

Proof

Generalizing the argument in the proof of Proposition 4, we find the P-series

$$\begin{aligned} h \, S\big (P(a,x)\big ) \int _0^1 \nabla H ((1-\xi )x + \xi \hat{x}) \, \mathrm {d}\xi&= P \bigg ( \int _0^1 \big (a \vee \big ((1-\xi )\hat{e}+\xi \Phi \big ) \big ) \, \mathrm {d}\xi , \, x \bigg ), \end{aligned}$$

where $\bar{\theta }(a) :=\int _0^1 \big (a \vee \big ((1-\xi )\hat{e}+\xi \Phi \big ) \big ) \, \mathrm {d}\xi = \int _0^1 \big (a \vee \xi \Phi \big ) \big ) \, \mathrm {d}\xi $, so that $\bar{\theta }(a)(\emptyset ) = 0, $, and

$$\begin{aligned} \bar{\theta }(a)([\tau _1,\ldots ,\tau _m,\bar{\tau }_1,\ldots ,\bar{\tau }_l]) = \frac{1}{m+1}\Phi (\tau _1) \cdots \Phi (\tau _m)a(\bar{\tau }_1) \cdots a(\bar{\tau }_l). \end{aligned}$$

(57)

Thus we may write the solution $\hat{x}$ found from applying the scheme (9) with the AVF discrete gradient (10) and $\overline{S}(x,\hat{x},h)$ given by (56) as

$$\begin{aligned} \begin{aligned} \hat{x}&= x + \sum _{n=0}^{p-1} h^n \sum _j b_{nj} \left( \prod _{k=1}^n S(P(\psi _{njk},x)) D^2 H(P(\phi _{njk},x)) \cdot P(\bar{\theta }(\psi _{nj(n+1)}),x) \right. \\&\quad \left. + (-1)^n \prod _{k=1}^n S(P(\psi _{nj(n-k+2)},x))D^2 H(P(\phi _{nj(n-k+1)},x)) \cdot P(\bar{\theta }(\psi _{nj1}),x) \right) \\&= x + \sum _{n=0}^{p-1} \sum _j b_{nj} \left( P((\psi _{nj1}, \phi _{nj1}) \times (\psi _{nj2}, \phi _{nj2}) \times \cdots \times (\psi _{njn} , \phi _{njn}) \times \bar{\theta }(\psi _{nj(n+1)}),x) \right. \\&\quad \left. + (-1)^n P((\psi _{nj(n+1)}, \phi _{njn}) \times (\psi _{njn}, \phi _{nj(n-1)}) \times \cdots \times (\psi _{nj2} , \phi _{nj1}) \times \bar{\theta }(\psi _{nj1}),x) \right) \\&= P(\Phi ,x), \end{aligned}\nonumber \\ \end{aligned}$$

(58)

with

$$\begin{aligned} \begin{aligned} \Phi&= \hat{e} + \sum _{n=0}^{p-1} \sum _j b_{nj} \, \big ( (\psi _{nj1}, \phi _{nj1}) \times \cdots \times (\psi _{njn}, \phi _{njn}) \times \bar{\theta }(\psi _{nj(n+1)}) \\&\quad + (-1)^n \, (\psi _{nj(n+1)}, \phi _{njn}) \times \cdots \times (\psi _{nj2}, \phi _{nj1}) \times \bar{\theta }(\psi _{nj1})\big ). \end{aligned} \end{aligned}$$

(59)

Theorem 4

The AVF discrete gradient method with $\overline{S}$ given by (56) is of order p if and only if

$$\begin{aligned} \Phi (\tau ) = \frac{1}{\gamma (\tau )} \quad \text { for } |\tau |\le p. \end{aligned}$$

(60)

The values $\Phi (\tau )$ can be found from (59) using (55) recursively and then (57). However, a more convenient approach is derived in the next subsection.

4.2 Order conditions

This subsection is devoted to generalizations of the results in Sect. 3.1 to the cases where S(x) is not necessarily constant. To that end, for a tree

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq371_HTML.gif

, we cut off all branches between black and white nodes and denote the mono-colored tree we are left with by $\tau ^{b}$. We number the nodes in that tree as before, from 1 to $|\tau ^{b}|$, and reattach the cut-off parts to the tree to get $\tau $ again. Let $\mu $ denote a forest of black-rooted trees and $\eta $ a forest of white-rooted trees. Then, for a given node $i \in [1,\ldots ,|\tau ^{b}|]$ on level $n+1$, there exists a unique set of forests $\hat{\tau }^i = \lbrace (\mu ^i_1,\eta ^i_1),\ldots ,(\mu ^i_{n+1},\eta ^i_{n+1})\rbrace $ such that

$$\begin{aligned} \tau =[(\mu ^i_1,\eta ^i_1)] \circ [(\mu ^i_2,\eta ^i_2)] \circ \cdots \circ [(\mu ^i_{n+1},\eta ^i_{n+1})]. \end{aligned}$$

That is,

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ173_HTML.png

Now we can generalize Proposition 3 as follows.

Proposition 5

The $\Phi $ of (59) can be found by

$$\begin{aligned} \Phi (\tau ) = \hat{e}(\tau ) + \sum _{i=1}^{|\tau ^{b}|}\Lambda (\hat{\tau }^i) \end{aligned}$$

(61)

where $\hat{e}(\emptyset ) = 1$ and $\hat{e}(\tau ) = 0$ for all $\tau \ne \emptyset $, and

$$\begin{aligned} \begin{aligned} \Lambda (\hat{\tau }^i)&= \theta ([\mu _{n+1}^i]) \sum _j b_{nj} \bigg ( \psi _{nj1}(\eta _1^i)\phi _{nj1}(\mu _1^i) \cdots \psi _{njn}(\eta _n^i)\phi _{njn}(\mu _n^i)\psi _{nj(n+1)}(\eta _{n+1}^i) \\&\quad + (-1)^n \psi _{nj(n+1)}(\eta _{1}^i)\phi _{njn}(\mu _1^i)\psi _{njn}(\eta _2^i) \cdots \phi _{nj1}(\mu _n^i)\psi _{nj1}(\eta _{n+1}^i) \bigg ), \end{aligned} \end{aligned}$$

(62)

with

$$\begin{aligned} \theta ([\tau _1,\ldots ,\tau _m]) = \frac{1}{m+1}\Phi (\tau _1)\cdots \Phi (\tau _m). \end{aligned}$$

Proof

Defining $n_i$ and $C_i$ as in the proof of Proposition 3, we have

$$\begin{aligned}&[(\mu _{n_i+1}^i,\eta _{n_i+1}^i)] = [(\mu _{n_k}^k,\eta _{n_k}^k)] \circ [(\mu _{n_k+1}^k,\eta _{n_k+1}^k)] \quad \text { for all } k\in C_i,\\&((a,b)\times c)([(\mu _{n_i+1}^i,\eta _{n_i+1}^i)]) = \sum _{k\in C_i} a(\eta _{n_k}^k)b(\mu _{n_k}^k) c([\mu _{n_k+1}^k,\eta _{n_k+1}^k]). \end{aligned}$$

Observe that $\bar{\theta }(a)([\mu ,\eta ]) = a(\eta ) \theta ([\mu ])$. For $n=0$ we have

$$\begin{aligned} \bar{\theta }(\psi _{0j1})(\tau ) = \bar{\theta }(\psi _{0j1})([\mu _1^{1},\eta _1^{1}]) = \psi _{0j1}(\eta _{1}^{1})\theta ([\mu _{1}^{1}]), \end{aligned}$$

and for $n>0$ we get

$$\begin{aligned}&((\psi _{nj1}, \phi _{nj1}) \times \cdots \times (\psi _{njn}, \phi _{njn}) \times \bar{\theta }(\psi _{nj(n+1)}))(\tau )\\&\quad = ((\psi _{nj1}, \phi _{nj1}) \times \cdots \times (\psi _{njn}, \phi _{njn}) \times \bar{\theta }(\psi _{nj(n+1)}))([\mu _1^{1},\eta _1^{1}])\\&\quad = \sum _{i_1\in C_1} \psi _{nj1}(\eta _1^{i_1})\phi _{nj1}(\mu _1^{i_1})((\psi _{nj2},\phi _{nj2})\\&\quad \times \cdots \times (\psi _{njn},\phi _{njn}) \times \bar{\theta }(\psi _{nj(n+1)}))([\mu _2^{i_1},\eta _2^{i_1}])\\&\qquad \vdots \\&\quad = \sum _{i_1\in C_1} \cdots \sum _{i_n\in C_{i_{n-1}}} \psi _{nj1}(\eta _1^{i_n})\phi _{nj1}(\mu _1^{i_n}) \cdots \psi _{njn}(\eta _n^{i_n})\phi _{njn}(\mu _n^{i_n})\bar{\theta }(\psi _{nj(n+1)})([\mu _{n+1}^{i_n},\eta _{n+1}^{i_n}])\\&\quad = \sum _{i \text{ on } \text{ level } n+1} \psi _{nj1}(\eta _1^{i})\phi _{nj1}(\mu _1^{i}) \cdots \psi _{njn}(\eta _n^{i})\phi _{njn}(\mu _n^{i})\psi _{nj(n+1)}(\eta _{n+1}^{i})\theta ([\mu _{n+1}^{i}]). \end{aligned}$$

Inserting this and the corresponding result for $((\psi _{nj(n+1)}, \phi _{njn}) \times \cdots \times (\psi _{nj2}, \phi _{nj1}) \times \bar{\theta }(\psi _{nj1}))(\tau )$ in (59), we get (62).

Note that if $\tau $ only has black nodes, we have $\Lambda (\hat{\tau }^1) = \theta (\tau ) \sum _j b_{0j} (\psi _{0j1}(\emptyset )+\psi _{0j1}(\emptyset )) = \theta (\tau )$, and also $\Lambda (\hat{\tau }^i) = 0$ for all nodes i on level 2. Thus (61) simplifies to (36).

Like for the constant S case, the order conditions can be given for energy-preserving linear combinations of elementary differentials instead for each elementary differential. In the following generalization of Lemma 6, we state that the energy-preserving linear combinations of bi-colored rooted trees are given by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ174_HTML.png

Theorem 5

Let $\mu _1, \mu _2, \ldots , \mu _n$ be arbitrary forests of black-rooted trees and $\eta _1, \eta _2, \ldots , \eta _{n+1}$ arbitrary forests of white-rooted trees. Given $f(x)=S(x)\nabla H(x)$, where S(x) is a skew-symmetric matrix, and elementary differentials defined by (53), the linear combinations of trees given by

$$\begin{aligned} \omega = [(\mu _1,\eta _1)] \circ \cdots \circ [(\mu _{n},\eta _{n})] \circ [\eta _{n+1}] + (-1)^n \, [(\mu _{n},\eta _{n+1})] \circ \cdots \circ [(\mu _{1},\eta _{2})] \circ [\eta _{1}] \end{aligned}$$

(63)

are energy-preserving in the sense that $F(\omega )(x) \cdot \nabla H(x) = 0$.

Proof

For any forest of black-rooted trees $\mu _j$, we have $F([\mu _j]\circ [\emptyset ]) = S B_j S\nabla H$ for some symmetric matrix $B_j$, suppressing the argument x. Similarly, for a forest of white-rooted trees $\eta _j$, we have $F([\eta _j]) = W_j\nabla H$ for some skew-symmetric matrix $W_j$. Note that the empty forest is considered both a black-rooted and a white-rooted forest, and accordingly we have

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq403_HTML.gif

and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq404_HTML.gif

. For these matrices $B_j$ and $W_j$ corresponding to the forests $\mu _j$ and $\eta _j$, we get

$$\begin{aligned} F\big ([(\mu _1,\eta _1)] \circ \cdots \circ [(\mu _{n},\eta _{n})] \circ [\eta _{n+1}]\big ) = W_1 B_1 W_2 B_2 \cdots B_n W_{n+1} \nabla H. \end{aligned}$$

We have

$$\begin{aligned} (W_1 B_1 W_2 B_2 \cdots B_n W_{n+1})^T = {\left\{ \begin{array}{ll} -W_{n+1} B_n W_n B_{n-1} \cdots B_1 W_1 &{} \text {if } n \text { even,}\\ W_{n+1} B_n W_n B_{n-1} \cdots B_1 W_1 &{} \text {if } n \text { odd.} \end{array}\right. } \end{aligned}$$

Thus $F(\omega )(x)$ is a skew-symmetric matrix times $\nabla H(x)$, and the statement in the above theorem follows directly.

For bi-colored trees, we define a node on the tree $\tau $ to be a leaf if it is a leaf on the corresponding cut tree $\tau ^{b}$ by the definition of leaves given in the previous section. We let $I_l$ be the set of leaves and $I_n$ the set of non-leaf nodes which are also in $\tau ^{b}$, so that $I_l \cup I_n = [ 1,\ldots ,|\tau ^{b}|]$. In contrast to the case with mono-colored trees, a leaf i on level one or two of a bi-colored tree may give rise to a non-zero energy-preserving linear combination; it does so if and only if $\eta ^i_{k} \ne \emptyset $ for any $k = 1,2$. Accordingly, $\Lambda (\hat{\tau }^i)$ is calculated in (61) also when $n=0,1$. Furthermore, two leaves i and j on the same level will belong to two different energy-preserving combinations if $\eta ^i_{n+1} \ne \eta ^j_{n+1}$. Therefore we now simply state that a tree with r leaves, also including the lower two levels, belong to at most r non-zero linear combinations. We thus get r terms on the left hand side of

$$\begin{aligned} \sum _{i\in I_l}\Lambda (\hat{\tau }^i) = \frac{1}{\gamma (\tau )} - \hat{e}(\tau ) - \sum _{i\in I_n}\frac{\Lambda (\lbrace (\mu ^i_1,\eta _1^i),\ldots ,(\mu ^i_{n},\eta ^i_{n}),(\emptyset ,\eta ^i_{n+1})\rbrace )}{(|\mu _{n+1}^i|+1) \gamma (\mu _{n+1}^i)}, \end{aligned}$$

(64)

which is equivalent to (60) if we assume the conditions for lower order to be satisfied.

Example 4

Consider the tree

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq422_HTML.gif

, which is part of the energy-preserving linear combination

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq423_HTML.gif

. Assume that the order conditions up to and including order three are all satisfied. The cut tree

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq424_HTML.gif

has three nodes of which two are leaves. Node number 2 is a leaf on level 2 with $\eta ^2_1 = \eta ^2_2 = \emptyset $, and thus gives $\Lambda (\hat{\tau }^2)=0$. We find for the other two,

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ175_HTML.png

For the right hand side of (64) we get

$$\begin{aligned} \frac{1}{\gamma (\tau )}-\Lambda (\hat{\tau }^1) = \frac{1}{8}-\frac{1}{6} = -\frac{1}{24}, \end{aligned}$$

and thus the order condition

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ176_HTML.png

for the energy-preserving linear combination

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq427_HTML.gif

Even though the number of black-rooted bi-colored trees grows very quickly, e.g. to 26 for $|\tau |=4$ and 107 for $|\tau |=5$, finding and satisfying the order conditions is not as daunting a task as it might first appear. First of all, it suffices to find order conditions for the non-zero linear combinations given by (63). Moreover, a couple key observations simplifies the process further:

The large number of trees $\tau $ for which , i.e. trees with no black nodes on level 2, are all energy-preserving. They can be written $\tau = [\eta _1^1]$, and their order condition is given by
$$\begin{aligned} 2 \sum _j b_{0j} \psi _{0j1}(\eta _1^1) = \frac{1}{\gamma (\tau )}. \end{aligned}$$
For trees that are identical except for the colors of the descendants of white nodes, it suffices to calculate one order condition. E.g. for we have the order condition , where each of the gray nodes may be black or white. To satisfy these conditions, it is natural to require that $\bar{z}_{0j1}$ in (56) is a B-series up to order $p-1$.

From the order conditions displayed in Table 4 we find that one second order scheme is given by (9) using the AVF discrete gradient (10) and an explicit skew-symmetric approximation of S given by $\overline{S}(x,\cdot ,h) = S(x + \frac{1}{2} h f(x))$. A third order scheme is obtained if we instead use the skew-symmetric approximation of S explicitly given by

$$\begin{aligned} \begin{aligned} \overline{S}(x,\cdot ,h) =&\, \frac{1}{4} S(x) + \frac{3}{4}S(z_2) + \frac{1}{4} h \, \big ( S(z_1) D^2 H(x)S(x) - S(x) D^2 H(x)S(z_1)\big ) \\&- \frac{1}{12} h^2 \, S(x) D^2 H(x)S(x) D^2 H(x)S(x), \end{aligned} \end{aligned}$$

(65)

where $z_1 = x + \frac{1}{3} h f(x), z_2 = x + \frac{2}{3} h f(z_1)$.

Table 4

Linear combinations $\omega $ of bi-colored black-rooted trees corresponding to energy-preserving elementary differentials of $f(x) = S(x)\nabla H(x)$, where S(x) is a skew-symmetric matrix, as well as their associated order conditions for the discrete gradient method (9) with the AVF discrete gradient (10) and $\overline{S}(x,\hat{x},h)$ given by (56)

$\|\tau \|$	$\omega $	Order condition
1		$2 \sum _j b_{0j} = 1$
2
3
		$\sum _j b_{2j} = -\frac{1}{24}$



4

A symmetric fourth order scheme is given by (9) using the AVF discrete gradient (10) and the skew-symmetric approximation of S

$$\begin{aligned} \begin{aligned} \overline{S}(x,\hat{x},h) =&\, \frac{1}{2} S\Big (\bar{x}- \frac{1}{\sqrt{12}} h f\big (\bar{x}+ \frac{1}{\sqrt{12}} h f(\bar{x})\big ) \Big )\\&+ \frac{1}{2} S\Big (\bar{x}+ \frac{1}{\sqrt{12}} h f\big (\bar{x}- \frac{1}{\sqrt{12}} h f(\bar{x})\big ) \Big )\\&+ \frac{1}{2} h \, S\big (\bar{x}+\frac{1}{12}h f(\bar{x})\big ) D^2 H(\bar{x})S\big (\bar{x}-\frac{1}{12}h f(\bar{x})\big ) \\&- \frac{1}{2} h \, S\big (\bar{x}-\frac{1}{12}h f(\bar{x})\big ) D^2 H(\bar{x})S\big (\bar{x}+\frac{1}{12}h f(\bar{x})\big ) \\&- \frac{1}{12} h^2 \, S(\bar{x}) D^2 H(\bar{x})S(\bar{x}) D^2 H(\bar{x})S(\bar{x}), \end{aligned}\nonumber \\ \end{aligned}$$

(66)

where $\bar{x}= \frac{x+\hat{x}}{2}$. Another fourth order scheme is obtained if we instead use the explicit skew-symmetric approximation of S found by

$$\begin{aligned} \begin{aligned} \overline{S}(x,\cdot ,h) =&\, \frac{1}{2} (S(z_5+z_6)+S(z_5-z_6))\\&+ \frac{1}{12} h \, \big ( S(z_2) D^2 H(z_1)S(x) - S(x) D^2 H(z_1)S(z_2)\big ) \\&- \frac{1}{12} h^2 \, S(z_1) D^2 H(z_1)S(z_1) D^2 H(z_1)S(z_1), \end{aligned} \end{aligned}$$

(67)

where

$$\begin{aligned} z_1&= x + \frac{1}{2} h f(x), z_3 = x + h f(z_2), \quad z_5 = \frac{1}{3}(x+z_1+z_2) + \frac{1}{12} (-z_3+z_4),\\ z_2&= x + h f(z_1), z_4 = x + h f(z_3), \quad z_6 = \frac{\sqrt{3}}{36}(7 x-2 z_1-4z_2+z_3-2z_4). \end{aligned}$$

5 Order conditions for general discrete gradient methods

We will now generalize the results of the two previous sections to discrete gradient methods with a general discrete gradient, as defined by (7)–(8). To that end, we introduce two new series in the vein of B- and P-series, as well as related tree structures.

5.1 The constant S case

Consider mono-colored rooted trees whose nodes can have two different shapes: the circle shape of the nodes in trees of B-series, but also a triangle shape. Let TG be the set of such trees whose leaves are always circles. That is, from the first tree

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq493_HTML.gif

, every tree $\tau \in TG$ can be built recursively through

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ177_HTML.png

which denotes the grafting of the trees $\tau _1,\ldots ,\tau _m$ to a root

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq496_HTML.gif

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq497_HTML.gif

, respectively. The elementary differentials $F(\tau )$ corresponding to a tree $\tau \in TG$ are likewise defined recursively by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq500_HTML.gif

and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ178_HTML.png

We can then define a generalization of B-series which includes these elementary differentials.

Definition 4

A G-series is a formal series of the form

$$\begin{aligned} G(\phi ,x) = \phi (\emptyset )x + \sum _{\tau \in TG}\frac{h^{|\tau |}}{\sigma (\tau )}\phi (\tau )F(\tau )(x), \end{aligned}$$

(68)

where $\phi : TG \cup \lbrace \emptyset \rbrace \rightarrow \mathbb {R}$ is an arbitrary mapping, and the symmetry coefficient $\sigma $ is given by (28).

The G-series of the exact solution is given by $x(t_0+h) = G(\xi ,x(t_0))$, with

$$\begin{aligned} \xi (\tau ) = {\left\{ \begin{array}{ll} \frac{1}{\gamma (\tau )} &{} \text {if } \tau \in T,\\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

(69)

For use in the remainder of this paper, we generalize the Butcher product by the definition

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ179_HTML.png

Furthermore, we let $|\tau |$ denote the total number of nodes in $\tau $, and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq506_HTML.gif

the number of nodes of type

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq507_HTML.gif

. Let SG be the set of tall trees in TG; that is, the set of trees with only one node on each level. For a tree $\tau \in TG$, number every tree from 1 to $|\tau |$, as before. For any node i on level $n+1$, we define the stem $s^i \in SG$ to be the tall tree consisting of the nodes connecting the root to node i, including the root and node i. Denote the $j^\text {th}$ node of $s^i$ by $s^i_j$, so that $s^i_1$ is the root and $s^i_{n+1} = i$. Then we have a unique set of forests $\hat{\tau }^i = \lbrace \mu ^i_1,\ldots ,\mu ^i_{n+1}\rbrace $ such that

$$\begin{aligned} \tau =[\mu ^i_1]_{s^i_1} \circ [\mu ^i_2]_{s^i_2} \circ \cdots \circ [\mu ^i_{n+1}]_{s^i_{n+1}}. \end{aligned}$$

That is,

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ180_HTML.png

The following lemma is a generalization of Lemma 5 to G-series. Its proof is very similar to the proof of [34, Theorem 2.2], and hence omitted.

Lemma 9

Let G(a, x) and G(b, x) be two G-series with $a(\emptyset ) = 1$ and $b(\emptyset ) = 0$. Then the G-series $h S D^2 H(G(a,x))G(b,x) = G(a\times b,x)$ is given by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq521_HTML.gif

and otherwise

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ181_HTML.png

Moreover, $h S Q(x,G(a,x))G(b,x) = G(a\otimes b,x)$, with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq523_HTML.gif

and otherwise

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ182_HTML.png

To every stem $s\in SG$ of height $n+1 = |s |$, we associate coefficients $b_{sj}$ and $\phi _{sjk}$. Letting $s_k$ be the $k^{\text {th}}$ node of s, we define the function

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ183_HTML.png

Then we have $h S R(\phi _{sjk},x) G(b,x) = G(\phi _{sjk} \diamond b)$, with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq531_HTML.gif

and

$$\begin{aligned} (\phi _{sjk} \diamond b)(\tau ) = {\left\{ \begin{array}{ll} \sum _{i=1}^m \prod _{j=1, j \ne i}^m \phi _{sjk}(\tau _j)b(\tau _i) &{} \text {for } \tau = [\tau _1,\ldots ,\tau _m]_{s_k},\\ 0 &{} \text {if root of } \tau \ne s_k. \end{array}\right. } \end{aligned}$$

Consider now the class of skew-symmetric and consistent approximations to S that can be written in the form

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ70_HTML.png

(70)

whenever $\hat{x}$ is the solution of

$$\begin{aligned} \frac{\hat{x}-x}{h} = \overline{S}(x,\hat{x},h)\overline{\nabla }H(x,\hat{x}), \end{aligned}$$

with $\phi _{sjk}(\emptyset ) = 1$ for every s, j, k, and with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq534_HTML.gif

Lemma 10

The discrete gradient method (9) with $\overline{S}(x,\hat{x},h)$ given by (70) and $\overline{\nabla }H \in C^\infty (\mathbb {R}^d\times \mathbb {R}^d,\mathbb {R}^d)$ is a G-series method when applied to a constant S skew-gradient system (6).

Proof

Assume that the solution $\hat{x}$ of (9) with $\overline{S}(x,\hat{x},h)$ given by (70) can be written as the G-series $\hat{x}= G(\Phi ,x)$. Then, using Lemma 2 and $\overline{\nabla }H(x,x) = \nabla H(x)$,

$$\begin{aligned} h S \overline{\nabla }H(x,\hat{x}) =&\, h S \sum _{m=0}^\infty \frac{1}{m!}D_2^{m} \overline{\nabla }H(x,x)(G(\Phi ,x)-x)^m \\ =&\, h S \sum _{m=0}^\infty \frac{1}{(m+1)!}D^m\nabla H(x)(G(\Phi ,x)-x)^m \\&- h S \sum _{m=1}^\infty \frac{2m}{(m+1)!} D_2^{m-1} Q(x,x) (G(\Phi ,x)-x)^m. \end{aligned}$$

Arguing as in the proof of Lemma III.1.9 in [21], we get $h \, S \, \overline{\nabla }H(x,\hat{x}) = G(\theta ,x)$, with $\theta (\emptyset ) = 0, $, and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ71_HTML.png

(71)

Then we can write (9) with $\overline{S}(x,\hat{x},h)$ given by (70) as

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ184_HTML.png

with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ72_HTML.png

(72)

Theorem 6

The discrete gradient method (9) with $\overline{S}(x,\hat{x},h)$ given by (70) and $\overline{\nabla }H \in C^{\infty }(\mathbb {R}^d\times \mathbb {R}^d,\mathbb {R}^d)$ is of order p if and only if

$$\begin{aligned} \Phi (\tau ) = \xi (\tau ) \quad \text { for } |\tau |\le p, \end{aligned}$$

(73)

where $\Phi $ is given by (72) and the $\xi $ is given by (69).

We remark that $\overline{\nabla }H \in C^{\infty }(\mathbb {R}^d\times \mathbb {R}^d,\mathbb {R}^d)$ is a necessary condition for the method to be a G-series method for all S and H, but not for its order; $\overline{\nabla }H \in C^{p-1}(\mathbb {R}^d\times \mathbb {R}^d,\mathbb {R}^d)$ is sufficient for the scheme to be of order p. The following proposition is presented without its proof, which follows along the lines of the proof of Proposition 3.

Proposition 6

The $\Phi $ of (73) satisfies

$$\begin{aligned} \Phi (\tau ) = \hat{e}(\tau ) + \theta (\tau ) +\sum _{i \text { s.t. } n\ge 1}\Lambda (\hat{\tau }^i,s^i) \end{aligned}$$

(74)

where $\hat{e}(\emptyset ) = 1$ and $\hat{e}(\tau ) = 0$ for all $\tau \ne \emptyset , \theta $ is given by (71), and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ75_HTML.png

(75)

with $\hat{s}^i$ given by $\hat{s}^i_k = s^i_{n-k+1}$ for $k=1,\ldots ,n$, and $\hat{s}^i_{n+1} = s^i_{n+1}$.

As for the AVF method, one does not need to find the order conditions for every tree; it suffices to find the order condition for each energy-preserving linear combination of the form

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ76_HTML.png

(76)

The above does not give every energy-preserving linear combination of the elementary differentials of G-series; it gives the combinations one gets in the scheme (9) with $\overline{S}(x,\hat{x},h)$ given by (70). Now, let again $I_l$ and $I_n$ denote the sets of leaf nodes and non-leaf nodes, respectively. If we assume the conditions for order $< p$ to be satisfied, we have an equivalent order condition to (73) by

$$\begin{aligned} \sum _{i\in I_l}\Lambda (\hat{\tau }^i,s^i) = \xi (\tau ) - \hat{e}(\tau ) - \sum _{i\in I_n} \Lambda (\hat{\tau }^i,s^i), \end{aligned}$$

(77)

where we may use the relation

$$\begin{aligned} \Lambda (\lbrace \mu ^i_1,\ldots ,\mu ^i_{n},\mu ^i_{n+1}\rbrace ,s^i) = \hat{\theta }([\mu _{n+1}^i]_{s_{n+1}^i}) \Lambda (\lbrace \mu ^i_1,\ldots ,\mu ^i_{n},\emptyset \rbrace ,\bar{s}^i) \end{aligned}$$

to calculate $\Lambda (\hat{\tau }^i)$ for $i \in I_n$. Here $\bar{s}^i$ is $s^i$ with $s^i_{n+1}$ replaced by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq567_HTML.gif

, and $\hat{\theta }(\emptyset ) = 0, $, and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ78_HTML.png

(78)

Note that $\Lambda (\hat{\tau }^1,s^1) = \hat{\theta }(\tau )$.

Table 5

Energy-preserving linear combinations of the form (76) and their associated order conditions for the discrete gradient method (9) with $\overline{S}(x,\hat{x},h)$ given by (70)

$\|\tau \|$	$\omega $	s	Order condition
1			$\sum _j b_{sj} = \frac{1}{2}$
2			$\sum _j b_{sj} = \frac{1}{2}$
3
			$\sum _j b_{sj} = \frac{1}{2}$
			$\sum _{j} b_{sj} -\sum _{j} b_{\bar{s}j} = 0$
			$\sum _j b_{sj} = -\frac{1}{24}$
4






			$\sum _{j} b_{sj} = \frac{1}{2}$
			$\sum _{j} b_{sj} -\sum _{j} b_{\bar{s}j} = 0$
			$\sum _{j} b_{sj} - \sum _{j} b_{\bar{s}j} = 0$
			$\sum _{j} b_{sj} = 0 $

From the order conditions in Table 5, we can find an $\overline{S}(x,\hat{x},h)$ so that (9) becomes a fourth order scheme for any $\overline{\nabla }H \in C^{3}(\mathbb {R}^d\times \mathbb {R}^d,\mathbb {R}^d)$. For instance, the stem

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq626_HTML.gif

has the related order conditions $\sum _j b_{sj} = \frac{1}{2}$ and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq628_HTML.gif

, which sets the requirements for the term

$$\begin{aligned} h^2 \sum _{j} b_{sj} (S Q(x,z_{1j}) S Q(x,z_{2j})+S Q(x,z_{2j}) S Q(x,z_{1j})) S. \end{aligned}$$

Choosing $b_{s1} = \frac{1}{2}$ and $z_{11} = z_{21} = x+\frac{2}{3}h f(x)$, we have fulfilled these conditions. Likewise, finding terms that satisfy the other order conditions, we get e.g.

$$\begin{aligned} \begin{aligned} \overline{S}(x,h) =&\, S + h S \big ( \frac{8}{9} Q(x,z_3) + \frac{1}{9} Q(x,x)\big ) S \\&+ h^2 S \big ( Q(x,z_2) S Q(x,z_2) - \frac{1}{12} D^2 H(z_1) S D^2 H(z_1)\big ) S \\&+ h^3 S \big ( Q(x,x) S Q(x,x) S Q(x,x) \\&- \frac{1}{12} D^2 H(x) S D^2 H(x) S Q(x,x) - \frac{1}{12} Q(x,x) S D^2 H(x) S D^2 H(x)\big )S, \end{aligned} \end{aligned}$$

(79)

where

$$\begin{aligned} z_1 = x + \frac{1}{2} h f(x), \qquad z_2 = x + \frac{2}{3} h f(x), \qquad z_3 = x + \frac{3}{4} h f(z_1). \end{aligned}$$

5.2 The general case

Allowing for S to be a function of the solution, we define now the set TV of bi-colored trees whose nodes are either circles of triangles, and whose leaves on the cut tree $\tau ^{b}$, defined as the mono-colored tree left when all branches between black and white nodes are cut off, are always circles. Denoting as before black-rooted subtrees by $\tau _i$ and white-rooted subtrees by $\bar{\tau }_i$, the elementary differentials of trees $\tau \in TV$ are given by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq635_HTML.gif

and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ185_HTML.png

where

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq636_HTML.gif

can be either

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq637_HTML.gif

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq638_HTML.gif

and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq639_HTML.gif

can be either

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq640_HTML.gif

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq641_HTML.gif

. Let

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq642_HTML.gif

denote the set of trees in TV with black roots, either of the shape

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq643_HTML.gif

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq644_HTML.gif

Definition 5

A V-series is a formal series of the form

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ80_HTML.png

(80)

where

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq645_HTML.gif

is an arbitrary mapping, and the symmetry coefficient $\sigma $ is given by (28).

Proofs of the theorems in this subsection can be obtained similarly to the proofs in Sects. 4 and 5.1, and are therefore omitted.

We redefine

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ186_HTML.png

and consider now approximations of S(x) that can be written as

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ81_HTML.png

(81)

whenever $\hat{x}$ is the solution of

$$\begin{aligned} \frac{\hat{x}-x}{h} = \overline{S}(x,\hat{x},h)\overline{\nabla }H(x,\hat{x}), \end{aligned}$$

with $\phi _{sjk}(\emptyset ) = \psi _{sjk}(\emptyset ) = 1$ for every s, j, k, and with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq649_HTML.gif

Theorem 7

The discrete gradient scheme (9) with the approximation of S(x) given by (81) and $\overline{\nabla }H \in C^{\infty }(\mathbb {R}^d\times \mathbb {R}^d,\mathbb {R}^d)$ is a V-series method. It can be written $\hat{x}= V(\Phi ,x)$, with

$$\begin{aligned} \begin{aligned} \Phi&= \hat{e} + \sum _{s \in SG} \sum _j b_{sj} \, \big ( (\psi _{sj1}, \phi _{sj1}) \diamond \cdots \diamond (\psi _{sjn}, \phi _{sjn}) \diamond \hat{\theta }(\psi _{sj(n+1)}) \\&\quad + (-1)^n \, (\psi _{sj(n+1)}, \phi _{sjn}) \diamond \cdots \diamond (\psi _{sj2}, \phi _{sj1}) \diamond \hat{\theta }(\psi _{sj1})\big ). \end{aligned} \end{aligned}$$

(82)

where

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ83_HTML.png

(83)

The scheme is of order p if and only if

$$\begin{aligned} \Phi (\tau ) = \xi (\tau ) \quad \text { for } |\tau |\le p, \end{aligned}$$

(84)

where

$$\begin{aligned} \xi (\tau ) = {\left\{ \begin{array}{ll} \frac{1}{\gamma (\tau )} &{} \text {if } \tau \in TP,\\ 0 &{} \text {otherwise}. \end{array}\right. } \end{aligned}$$

(85)

As in Sect. 4.2, we cut the branches between black and white nodes, regardless of the shape of the nodes, and denote this tree by $\tau ^{b}$. Number the nodes and reattach the cut-off parts. For the node i and the corresponding stem $s^i$, there exists a unique set of forests $\hat{\tau }^i = \lbrace (\mu ^i_1,\eta ^i_1),\ldots ,(\mu ^i_{n+1},\eta ^i_{n+1})\rbrace $ such that

$$\begin{aligned} \tau =[(\mu _1^i,\eta ^i_1)]_{s^i_1} \circ \cdots [(\mu ^i_{n},\eta ^i_{n})]_{s^i_n} \circ [(\mu ^i_{n+1},\eta ^i_{n+1})]_{s^i_{n+1}} \end{aligned}$$

Proposition 7

The $\Phi $ of (82) satisfies

$$\begin{aligned} \Phi (\tau ) = \hat{e}(\tau ) + \sum _{i=1}^{|\tau ^{b}|} \Lambda (\hat{\tau }^i,s^i) \end{aligned}$$

(86)

where $\hat{e}(\emptyset ) = 1$ and $\hat{e}(\tau ) = 0$ for all $\tau \ne \emptyset $, and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ87_HTML.png

(87)

with $\theta $ given by (71) and $\hat{s}^i$ given by $\hat{s}^i_k = s^i_{n-k+1}$ for $k=1,\ldots ,n$, and $\hat{s}^i_{n+1} = s^i_{n+1}$.

The number of trees in TV grows very quickly. However, in our task of finding higher order schemes we may use the lessons of the previous sections, and require that the arguments of $S, D^2 H$ and Q in (81) are B-series up to order $p-1$. Then we only need to find order conditions for energy-preserving linear combinations of the form

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ88_HTML.png

(88)

where $\mu _i$ and $\eta _i$ are forests of trees in

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq668_HTML.gif

and

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq669_HTML.gif

respectively, for $i=1,\ldots ,n+1$. Thus we can disregard any tree with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq671_HTML.gif

in it. Furthermore, we may color all nodes of the trees in $\mu _i$ and $\eta _i$ except the roots gray, and let the elementary differentials corresponding to these trees be the same as the elementary differentials of B-trees.

We find the order conditions

$$\begin{aligned} \sum _{i\in I_l}\Lambda (\hat{\tau }^i,s^i) = \xi (\tau ) - \hat{e}(\tau ) - \sum _{i\in I_n} \Lambda (\hat{\tau }^i,s^i), \end{aligned}$$

by using the relation

$$\begin{aligned}&\Lambda (\lbrace (\mu ^i_1,\eta _1^i),\ldots ,(\mu ^i_{n+1},\eta ^i_{n+1})\rbrace ,s^i)\\&\quad = \hat{\theta }([\mu _{n+1}^i]_{s_{n+1}^i}) \Lambda (\lbrace (\mu ^i_1,\eta _1^i),\ldots ,(\emptyset ,\eta ^i_{n+1})\rbrace ,\bar{s}^i) \end{aligned}$$

to calculate $\Lambda (\hat{\tau }^i)$ for $i \in I_n$. The $\hat{\theta }$ is given by (78), and $\bar{s}^i$ is $s^i$ with $s^i_{n+1}$ replaced by

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq680_HTML.gif

Example 5

Consider

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq681_HTML.gif

, which is part of the energy-preserving linear combination

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq682_HTML.gif

. We have two black nodes, and calculate

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ187_HTML.png

Hence the order condition associated to this linear combination is

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_Equ188_HTML.png

An approximation of S(x) satisfying all the order conditions in Tables 5 and 6 is given by

$$\begin{aligned} \begin{aligned} \overline{S}(x,\cdot ,h) =&\frac{1}{2} (S(z_{11}+z_{12})+S(z_{11}-z_{12}))\\&+ \frac{1}{12} h \, \big ( S(z_6) D^2 H(z_2)S(x) - S(x) D^2 H(z_2)S(z_6)\big ) \\&+ \frac{3}{7} h \big ( S(z_3)Q(x,z_5)S(z_4) + S(z_4)Q(x,z_5) S(x,z_3)\big ) \\&+\frac{8}{105} h S(x)Q(x,z_7)S(x) + \frac{1}{15} h S(x)Q(x,x)S(x)\\&+ h^2 \, S(z_2)Q(x,z_5)S(z_8)Q(x,z_5)S(z_2)\\&- \frac{1}{12} h^2 \, S(z_2) D^2 H(z_2)S(z_2) D^2 H(z_2)S(z_2)\\&+ \frac{1}{6} h^2 (S(z_2)-S(x)) D^2 H(x) S(x) Q(x,x) S(x) \\&- \frac{1}{6} h^2 S(x) Q(x,x) S(x) D^2 H(x) (S(z_2)-S(x))\\&+ h^3 S(x) Q(x,x) S(x) Q(x,x) S(x) Q(x,x) \\&- \frac{1}{12} h^3 S(x) D^2 H(x) S(x) D^2 H(x) S(x) Q(x,x) S(x)\\&- \frac{1}{12} h^3 S(x) Q(x,x) S(x) D^2 H(x) S(x) D^2 H(x)S(x), \end{aligned}\nonumber \\ \end{aligned}$$

(89)

with

$$\begin{aligned}&z_1 = x + \frac{1}{3} h f(x), \qquad z_2 = x + \frac{1}{2} h f(x), \qquad z_3 = x + \frac{7-\sqrt{7}}{12} h f(z_1),\\&z_4 = x + \frac{7+\sqrt{7}}{12} h f(z_1), \qquad z_5 = x + \frac{2}{3} h f(z_2), \qquad z_6 = x + h f(z_2), \\&z_7 = x + \frac{5}{4} h f(z_2), \qquad z_{8} = x + \frac{4}{3} h f(z_2), \qquad z_{9} = x + h f(z_6), \quad z_{10} = x + h f(z_9),\\&z_{11} = \frac{1}{3}(x+z_2+z_6) + \frac{1}{12} (-z_9+z_{10}), \qquad z_{12} = \frac{\sqrt{3}}{36}(7 x-2 z_2-4z_6+z_9-2z_{10}), \end{aligned}$$

and hence a discrete gradient scheme with this $\overline{S}(x,\cdot ,h)$ and any $\overline{\nabla }\in C^3(\mathbb {R}^d\times \mathbb {R}^d,\mathbb {R}^d)$ will be of fourth order.

Table 6

The energy-preserving linear combinations of the form (88) not given in Table 5, up to fourth order

$\|\tau \|$	$\omega $	s	Order condition
2
3



4

One advantage of choosing the AVF discrete gradient is that the resulting scheme generally requires fewer computations at each time step. This is clearly evident in the above example: if $\overline{\nabla }= \overline{\nabla }_{\text {AVF}}$, then (89) collapses to (67). However, if the AVF discrete gradient is difficult to calculate, there can also be much to gain in computational cost by choosing a symmetric discrete gradient, like the symmetrized Itoh–Abe discrete gradient (23) or the Furihata discrete gradient (25). Then one can ignore the order condition for any combination (88) for which

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq757_HTML.gif

and $\mu _j = \emptyset $ for some $j \in [1,n]$, since this corresponds to elementary differentials involving Q(x, x), which we recall is zero when the discrete gradient is symmetric. If we consider the conditions for fourth order presented in Tables 5 and 6, this eliminates 17 of the 22 conditions for trees with

https://static-content.springer.com/image/art%3A10.1007%2Fs10543-022-00909-z/MediaObjects/10543_2022_909_IEq760_HTML.gif

in the stem. By considering the remaining order conditions we get that, if $\overline{\nabla }H \in C^3(\mathbb {R}^d\times \mathbb {R}^d,\mathbb {R}^d)$ and $\overline{\nabla }H(x,y) = \overline{\nabla }H(y,x)$, the discrete gradient scheme (9) is of fourth order if

$$\begin{aligned} \begin{aligned} \overline{S}(x,\cdot ,h) =&\frac{1}{2} (S(z_5+z_6)+S(z_5-z_6))\\&+ \frac{1}{12} h \, \big ( S(z_2) D^2 H(z_1)S(x) - S(x) D^2 H(z_1)S(z_2)\big ) \\&+ \frac{8}{9} h S(z_1) Q(x,z_7) S(z_1) - \frac{1}{12} h^2 \, S(z_1) D^2 H(z_1)S(z_1) D^2 H(z_1)S(z_1), \end{aligned} \end{aligned}$$

(90)

with

$$\begin{aligned}&z_1 = x + \frac{1}{2} h f(x), \qquad z_2 = x + h f(z_1), \qquad z_3 = x + h f(z_2),\\&z_4 = x + h f(z_3), \qquad z_5 = \frac{1}{3}(x+z_1+z_2) + \frac{1}{12} (-z_3+z_4),\\&z_6 = \frac{\sqrt{3}}{36}(7 x-2 z_1-4z_2+z_3-2z_4), \qquad z_7 = x + \frac{3}{4}h f(z_1). \end{aligned}$$

If S is constant, (90) simplifies to

$$\begin{aligned} \overline{S}(x,\cdot ,h) = S + \frac{8}{9} h S Q(x,z_7) S - \frac{1}{12} h^2 \, S D^2 H(z_1)S D^2 H(z_1)S. \end{aligned}$$

(91)

6 Numerical experiments

For all the examples considered in the following, the resulting discrete gradient schemes are nonlinearly implicit systems, which we here solve using Newton’s method. Note that whenever $\overline{S}(x,\hat{x},h)$ is independent of $\hat{x}$, the Jacobian of

$$\begin{aligned} F(\hat{x}) = \hat{x}- x - h \overline{S}(x,\hat{x},h)\overline{\nabla }H(x,\hat{x}) \end{aligned}$$

is given by

$$\begin{aligned} J(\hat{x}) = I - h \overline{S}(x,\hat{x},h) D_2 \overline{\nabla }H(x,\hat{x}), \end{aligned}$$

(92)

where $D_2 \overline{\nabla }H(x,\hat{x})$ may also be used for the calculation of $Q(x,\hat{x})$. The extra computational cost of using a higher-order scheme with an explicit $\overline{S}$ thus only lies in the computation of this $\overline{S}$ once each time step. A scheme with an implicitly defined $\overline{S}(x,\hat{x},h)$ can give a much more complicated Jacobian, but a quasi-Newton method using the approximate Jacobian given by (92) may still be very efficient.

6.1 Hénon–Heiles system

The Hénon–Heiles system can be written in the form (6) with

$$\begin{aligned} S = \begin{pmatrix} 0 &{} I\\ -I &{} 0 \end{pmatrix}, \qquad H(q,p) = \frac{1}{2}(q_1^2+q_2^2+p_1^2+p_2^2) + q_1^2 q_2 - \frac{1}{3}q_2^3, \end{aligned}$$

(93)

where I is the $2 \times 2$ identity matrix. We use here the same initial conditions used in [37]: $q_1=\frac{1}{10}$, $q_2 = -\frac{1}{2}, p_1 = p_2 = 0$. The order of some of the energy-preserving methods proposed in this paper are confirmed by the left plot in Fig. 1. We compare the performance of the fourth order discrete gradient methods obtained by using the $\overline{S}$ given by (79) coupled with three different discrete gradients: the Itoh–Abe discrete gradient (22), the Furihata discrete gradient (25), and the AVF discrete gradient (10) (right plot of Fig. 1). The symmetrized Itoh–Abe discrete gradient (23) is for this H identical to the Furihata discrete gradient. The AVF and Furihata discrete gradient methods perform in this case very similarly, and thus the error from the Furihata discrete gradient method is excluded from the right plot in Fig. 1. We observe that, although it initially performs on par with the AVF method, the Itoh–Abe discrete gradient method gives a lower global error than the other fourth order methods as time goes on. Note however that this method requires the most computations at every time step.

6.2 Lotka–Volterra

The methods should also be tested on a skew-gradient system with non-constant S. We choose the Lotka–Volterra system also used for numerical experiments in [10]. It is given by

$$\begin{aligned} S = \frac{1}{2} \begin{pmatrix} 0 &{} -x_1 x_2 &{} x_1 x_3\\ x_1 x_2 &{} 0 &{} -2 x_2 x_3 \\ -x_1 x_3 &{} 2 x_2 x_3 &{} 0 \end{pmatrix}, \qquad H(x) = 2 x_1 + x_2 + 2x_3 + \ln (x_2) - 2\ln (x_3), \end{aligned}$$

(94)

and initial conditions $x_1 = 1, x_2=\frac{19}{10}, x_3 = \frac{1}{2}$. For this H, the Itoh–Abe, Furihata and AVF discrete gradients are all equivalent. Order plots are given in Fig. 2. We consider fourth order discrete gradient methods where $\nabla {S}$ is given either dependent on or independent of $\hat{x}$; that is, (66) or (67). The implicitly given (67) yields a significantly lower error in the solution of the corresponding discrete gradient method, as can be witnessed from the left plot in Fig. 3. In contrast to what we observed for the canonical Hamiltonian system studied above, none of the discrete gradient methods give a global error lower than that of the fourth order Gauss–Legendre method.

6.3 Hamiltonian neural networks

We present here results on the pendulum problem also considered in [19, 27]. Training data is first generated by adding noise to solutions of the system (6) with

$$\begin{aligned} S = \begin{pmatrix} 0 &{} 1\\ -1 &{} 0 \end{pmatrix}, \qquad H(q,p) = 2 m g l (1- \cos {q}) + \frac{l^2}{2 m} p^2, \end{aligned}$$

with $l=m=1$ and $g=3$, for various times $t \in \left[ 0,20\right] $ and 50 randomly sampled initial coordinates. Then a Hamiltonian neural network (HNN) is used to learn the Hamiltonian. If the automatic discrete differentiation algorithm is used for the training, the corresponding discrete gradient is learned simultaneously. To compute Q(x, y), we require that the network also learns the Jacobian of the discrete gradient with respect to the second argument. This necessitates a modification of the network developed by Matsubara et al [27], which we have done using the autograd class of PyTorch. The resulting Jacobian is in any case very useful for integrating the system by any discrete gradient method, since we then can employ a root-finding algorithm that use (92), e.g. Newton’s method.

We have tested second, third and fourth order discrete gradient methods both for training the network and for integrating the obtained dynamical system. For the fourth order scheme, we have used the $\overline{S}$ given by (91) and for the third order scheme we have used

$$\begin{aligned} \overline{S}(x,\cdot ,h) = S + h S Q(x,z) S - \frac{1}{12} h^2 \, S D^2 H(x)S D^2 H(x)S, \end{aligned}$$

(95)

where $z = x + \frac{2}{3} h S \nabla H(x)$. Note that $\nabla H$ and $D^2 H$ can be obtained from the network through the relations $\nabla H(x) = \overline{\nabla }H(x,x)$ and $D^2 H(x) = 2 D_2\overline{\nabla }H(x,x)$.

As is also noted in [27], energy preservation seems to be more important than high order for the method used to train the system. In Fig. 4 we see an example of this: the second order discrete gradient method outperforms the fourth order Runge–Kutta method. Whether or not higher-order discrete gradient methods give an improved performance that justifies the extra computational cost is not clear on this example. A thorough investigation of this for more complex problems is a matter for future studies.

For energy-preserving integration of the learned system, much can be gained in accuracy by using a higher-order discrete gradient method, at the expense of not much additional computational cost. The order plots in Fig. 5 are obtained by comparing to a solution found by the fourth order discrete gradient method with a smaller step size. Python code for the neural networks used to produce the plots in Figs. 4 and 5, and additional results on a mass-spring system, is available at https://github.com/solveeid/dgnet4.

7 Conclusions and future work

The main purpose of this paper has been to develop order theory for discrete gradient methods. This was achieved through the introduction of the function Q, Lemma 2 and a generalization of B- and P-series results. Propositions 3, 5, 6 and 7 present results which simplifies the derivation of the conditions from the general theory. The techniques introduced in this paper for building on B- and P-series methods can possibly be used on more methods than the discrete gradient methods. Future research may utilize this.

We have proposed some higher order schemes satisfying the derived order conditions. The development of specific schemes has however not been the main focus of this paper; analysis to find more optimal schemes is something we leave for the future. After such an analysis is performed, the methods could be tested on more advanced problems than those considered above, e.g. for the temporal discretization of Hamiltonian partial differential equations, and their performance as measured by accuracy relative to computational cost could be compared to existing methods.

The use of neural networks to train dynamical systems with preservation properties is an emerging field of study where new developments are coming with a high frequency. To our knowledge, the use of higher-order one-step energy-preserving integrators in this setting has not previously been studied, and may not even have been known to be possible until the novel results of this paper. The potential utility of the methods remain largely unexplored and will be considered in future work. This will include using methods of order higher than four and analysis of the methods compared to and in connection with other recent developments, both in the training of the network and for the integration of the resulting system. It would also be interesting to investigate how the methods perform on more advanced examples, including Hamiltonian PDEs, and for Lagrangian neural networks [1, 11].

The order theory presented in this paper can also be developed further in several different directions. The schemes given with $\overline{S}$ independent of $\hat{x}$ are linearly implicit when H is quadratic; if the order theory is extended to the polarized discrete gradient methods of [12, 13, 28], we could get higher order linearly implicit multi-step schemes for systems with polynomial first integrals of any degree. Another avenue could be to consider order conditions for the discrete Riemannian gradient methods presented in [4]. Then the results in Sect. 5 are especially interesting, since the integral in the AVF discrete Riemannian gradient can be challenging to compute analytically. Lastly, building on [2], order conditions for the exponential AVF method is given in [33]. This could be extended to exponential integrators using any discrete gradient by the theory presented in this paper.

Acknowledgements

The author is grateful to Brynjulf Owren for the invaluable suggestions he has provided during the writing of this paper. The author would like to thank the Isaac Newton Institute for Mathematical Sciences, Cambridge, for support and hospitality during the programme Geometry, compatibility and structure preservation in computational differential equations, where work on this paper was undertaken. This work was supported by EPSRC Grant No. EP/R014604/1.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Vorheriger Artikel Fast rotated BSOR method for block two-by-two linear systems with application to PDE-constrained optimal control problems

Nächster Artikel Color image and video restoration using tensor CP decomposition

Aoshima, T., Matsubara, T., Yaguchi, T.: Deep discrete-time Lagrangian mechanics. ICLR2021 Workshop on Deep Learning for Simulation (SimDL) (2021)

Berland, H., Owren, B., Skaflestad, B.: $B$-series and order conditions for exponential integrators. SIAM J. Numer. Anal. 43(4), 1715–1727 (2005). https://doi.org/10.1137/040612683MathSciNetCrossRefMATH

Cai, J., Hong, J., Wang, Y., Gong, Y.: Two energy-conserved splitting methods for three-dimensional time-domain Maxwell’s equations and the convergence analysis. SIAM J. Numer. Anal. 53(4), 1918–1940 (2015). https://doi.org/10.1137/140971609MathSciNetCrossRefMATH

Celledoni, E., Eidnes, S., Owren, B., Ringholm, T.: Energy-preserving methods on Riemannian manifolds. Math. Comp. 89(322), 699–716 (2020). https://doi.org/10.1090/mcom/3470MathSciNetCrossRefMATH

Celledoni, E., Grimm, V., McLachlan, R.I., McLaren, D.I., O’Neale, D., Owren, B., Quispel, G.R.W.: Preserving energy resp. dissipation in numerical PDEs using the “Average Vector Field” method. J. Comput. Phys. 231(20), 6770–6789 (2012). https://doi.org/10.1016/j.jcp.2012.06.022

Celledoni, E., McLachlan, R.I., McLaren, D.I., Owren, B., Quispel, G.R.W., Wright, W.M.: Energy-preserving Runge–Kutta methods. M2AN Math. Model. Numer. Anal. 43(4), 645–649 (2009). https://doi.org/10.1051/m2an/2009020

Chartier, P., Faou, E., Murua, A.: An algebraic approach to invariant preserving integrators: the case of quadratic and Hamiltonian invariants. Numer. Math. 103(4), 575–590 (2006). https://doi.org/10.1007/s00211-006-0003-8MathSciNetCrossRefMATH

Chen, Z., Zhang, J., Arjovsky, M., Bottou, L.: Symplectic recurrent neural networks. arXiv preprint, arXiv:1909.13334 (2019)

Chorin, A.J., McCracken, M.F., Hughes, T.J.R., Marsden, J.E.: Product formulas and numerical algorithms. Comm. Pure Appl. Math. 31(2), 205–256 (1978). https://doi.org/10.1002/cpa.3160310205MathSciNetCrossRefMATH

10.

Cohen, D., Hairer, E.: Linear energy-preserving integrators for Poisson systems. BIT 51(1), 91–101 (2011). https://doi.org/10.1007/s10543-011-0310-zMathSciNetCrossRefMATH

11.

Cranmer, M., Greydanus, S., Hoyer, S., Battaglia, P., Spergel, D., Ho, S.: Lagrangian neural networks. arXiv preprint, arXiv:2003.04630 (2020)

12.

Dahlby, M., Owren, B.: A general framework for deriving integral preserving numerical methods for PDEs. SIAM J. Sci. Comput. 33(5), 2318–2340 (2011). https://doi.org/10.1137/100810174MathSciNetCrossRefMATH

13.

Eidnes, S., Li, L., Sato, S.: Linearly implicit structure-preserving schemes for Hamiltonian systems. J. Comput. Appl. Math. 387, 112489 (2021). https://doi.org/10.1016/j.cam.2019.112489

14.

Eidnes, S., Owren, B., Ringholm, T.: Adaptive energy preserving methods for partial differential equations. Adv. Comput. Math. 44(3), 815–839 (2018). https://doi.org/10.1007/s10444-017-9562-8MathSciNetCrossRefMATH

15.

Faou, E., Hairer, E., Pham, T.L.: Energy conservation with non-symplectic methods: examples and counter-examples. BIT 44(4), 699–709 (2004). https://doi.org/10.1007/s10543-004-5240-6MathSciNetCrossRefMATH

16.

Furihata, D.: Finite difference schemes for $\partial u/\partial t=(\partial /\partial x)^\alpha \delta G/\delta u$ that inherit energy conservation or dissipation property. J. Comput. Phys. 156(1), 181–205 (1999). https://doi.org/10.1006/jcph.1999.6377MathSciNetCrossRefMATH

17.

Furihata, D., Matsuo, T.: DiscretE Variational Derivative Method. Chapman & Hall/CRC Numerical Analysis and Scientific Computing. CRC Press, Boca Raton (2011). A Structure-Preserving Numerical Method for Partial Differential Equations

18.

Gonzalez, O.: Time integration and discrete Hamiltonian systems. J. Nonlinear Sci. 6(5), 449–467 (1996). https://doi.org/10.1007/s003329900018MathSciNetCrossRefMATH

19.

Greydanus, S., Dzamba, M., Yosinski, J.: Hamiltonian neural networks. Advances in Neural Information Processing Systems 32, 15379–15389 (2019)

20.

Hairer, E.: Energy-preserving variant of collocation methods. JNAIAM. J. Numer. Anal. Ind. Appl. Math. 5(1–2), 73–84 (2010)MathSciNetMATH

21.

Hairer, E., Lubich, C., Wanner, G.: Geometric Numerical Integration, Springer Series in Computational Mathematics, vol. 31, 2nd edn. Springer, Berlin (2006). Structure-Preserving Algorithms for Ordinary Differential Equations

22.

Harten, A., Lax, P.D., van Leer, B.: On upstream differencing and Godunov-type schemes for hyperbolic conservation laws. SIAM Rev. 25(1), 35–61 (1983). https://doi.org/10.1137/1025002MathSciNetCrossRefMATH

23.

Itoh, T., Abe, K.: Hamiltonian-conserving discrete canonical equations based on variational difference quotients. J. Comput. Phys. 76(1), 85–102 (1988). https://doi.org/10.1016/0021-9991(88)90132-5MathSciNetCrossRefMATH

24.

Jiang, C., Sun, J., Li, H., Wang, Y.: A fourth-order AVF method for the numerical integration of sine-Gordon equation. Appl. Math. Comput. 313, 144–158 (2017). https://doi.org/10.1016/j.amc.2017.05.055MathSciNetCrossRefMATH

25.

Kriksin, Y.A.: A conservative difference scheme for a system of Hamiltonian equations with external action. Zh. Vychisl. Mat. i Mat. Fiz. 33(2), 206–218 (1993)MathSciNetMATH

26.

Marsden, J.E.: Lectures on Mechanics, London Mathematical Society Lecture Note Series, vol. 174. Cambridge University Press, Cambridge (1992)

27.

Matsubara, T., Ishikawa, A., Yaguchi, T.: Deep energy-based modeling of discrete-time physics. Adv. Neural Inf. Process. Syst. 33, 13100–13111 (2020)

28.

Matsuo, T., Furihata, D.: Dissipative or conservative finite-difference schemes for complex-valued nonlinear partial differential equations. J. Comput. Phys. 171(2), 425–447 (2001). https://doi.org/10.1006/jcph.2001.6775MathSciNetCrossRefMATH

29.

Matsuo, T., Sugihara, M., Furihata, D., Mori, M.: Spatially accurate dissipative or conservative finite difference schemes derived by the discrete variational method. Jpn. J. Ind. Appl. Math. 19(3), 311–330 (2002). https://doi.org/10.1007/BF03167482MathSciNetCrossRefMATH

30.

McLachlan, R.I., Quispel, G.R.W., Robidoux, N.: Geometric integration using discrete gradients. R. Soc. Lond. Philos. Trans. Ser. A Math. Phys. Eng. Sci. 357(1754), 1021–1045 (1999). https://doi.org/10.1098/rsta.1999.0363

31.

McLaren, D.I., Quispel, G.R.W.: Integral-preserving integrators. J. Phys. A 37(39), L489–L495 (2004). https://doi.org/10.1088/0305-4470/37/39/L01MathSciNetCrossRefMATH

32.

McLaren, D.I., Quispel, G.R.W.: Bootstrapping discrete-gradient integral-preserving integrators to fourth order. In: Daniel, M., Rajasekar, S. (Eds.) Nonlinear Dynamics, pp. 157–171. Narosa Publishing House (2009)

33.

Mei, L., Huang, L., Wu, X.: Energy-preserving exponential integrators of arbitrarily high order for conservative or dissipative systems with highly oscillatory solutions. J. Comput. Phys. 442, Paper No. 110429, 22 (2021). https://doi.org/10.1016/j.jcp.2021.110429

34.

Nørsett, S.P., Wolfbrandt, A.: Order conditions for Rosenbrock type methods. Numer. Math. 32(1), 1–15 (1979). https://doi.org/10.1007/BF01397646MathSciNetCrossRefMATH

35.

Norton, R.A., McLaren, D.I., Quispel, G.R.W., Stern, A., Zanna, A.: Projection methods and discrete gradient methods for preserving first integrals of ODEs. Discrete Contin. Dyn. Syst. 35(5), 2079–2098 (2015). https://doi.org/10.3934/dcds.2015.35.2079MathSciNetCrossRefMATH

36.

Norton, R.A., Quispel, G.R.W.: Discrete gradient methods for preserving a first integral of an ordinary differential equation. Discrete Contin. Dyn. Syst. 34(3), 1147–1170 (2014). https://doi.org/10.3934/dcds.2014.34.1147MathSciNetCrossRefMATH

37.

Quispel, G.R.W., McLaren, D.I.: A new class of energy-preserving numerical integration methods. J. Phys. A 41(4), 045206, 7 (2008). https://doi.org/10.1088/1751-8113/41/4/045206

38.

Yaguchi, T., Matsuo, T., Sugihara, M.: The discrete variational derivative method based on discrete differential forms. J. Comput. Phys. 231(10), 3963–3986 (2012). https://doi.org/10.1016/j.jcp.2012.01.035MathSciNetCrossRefMATH

39.

Zhong, Y.D., Dey, B., Chakraborty, A.: Symplectic ODE-net: learning Hamiltonian dynamics with control. In: International Conference on Learning Representations (2019)

Titel: Order theory for discrete gradient methods
verfasst von: Sølve Eidnes
Publikationsdatum: 07.02.2022
Verlag: Springer Netherlands
Erschienen in: BIT Numerical Mathematics / Ausgabe 4/2022
Print ISSN: 0006-3835
Elektronische ISSN: 1572-9125
DOI: https://doi.org/10.1007/s10543-022-00909-z

\(\|\tau \|\)	\(F(\tau )^i\)	\(\sigma (\tau )\)	\(\gamma (\tau )\)	\(\Phi (\tau )\)
1	\(f^i\)	1	1	1
2	\(f^i_jf^j\)	1	2	\(\frac{1}{2}\)
3	\(f^i_{jk}f^jf^k\)	2	3	\(\frac{1}{3}\)
	\(f^i_jf^j_kf^k\)	1	6	\(\frac{1}{4}+2\sum _j b_{2j}\)
4	\(f^i_{jkl}f^jf^kf^l\)	6	4	\(\frac{1}{4}\)
	\(f^i_{jk}f^jf^k_lf^l\)	1	8
	\(f^i_{j}f^j_{kl}f^kf^l\)	2	12
	\(f^i_{j}f^j_{k}f^k_lf^l\)	1	24	\(\frac{1}{8}+2\sum _j b_{2j}\)

\(\|\tau \|\)	\(F(\tau )^i\)	\(F(\tau )\)	\(\alpha (\tau )\)	\(\gamma (\tau )\)	\(\sigma (\tau )\)
1	\(S^i_j g^j\)	f	1	1	1
2	\(S^i_{jk} g^j S^{k}_l g^l\)	\(f^\circ f\)	1	2	1
	\(S^i_j g^j_{k} S^{k}_l g^l\)	\(f^\bullet f\)	1	2	1
3	\(S^{i}_{j k m} g^{j} S^{k}_l g^l S^{m}_n g^n\)	\(f^{\circ \circ } (f,f)\)	1	3	2
	\(S^{i}_{jk} g^j_{m} S^{k}_l g^l S^{m}_n g^n\)	\(f^{\circ \bullet } (f,f)\)	2	3	1
	\(S^{i}_j g^j_{k m} S^{k}_l g^l S^{m}_n g^n\)	\(f^{\bullet \bullet } (f,f)\)	1	3	2
	\(S^{i}_{jk} g^{j} S^{k}_{lm} g^l S^{m}_n g^n\)	\(f^{\circ } f^{\circ } f\)	1	6	1
	\(S^{i}_j g^j_{k} S^{k}_{lm} g^{l} S^{m}_n g^n\)	\(f^{\bullet } f^{\circ } f\)	1	6	1
	\(S^{i}_{jk} g^{j} S^{k}_l g^{l}_m S^{m}_n g^n\)	\(f^{\circ } f^{\bullet } f\)	1	6	1
	\(S^{i}_j g^j_{k} S^{k}_l g^l_{m} S^{m}_n g^n\)	\(f^{\bullet } f^{\bullet } f\)	1	6	1

\(\|\tau \|\)	\(\omega \)	Order condition
1		\(2 \sum _j b_{0j} = 1\)
2
3
		\(\sum _j b_{2j} = -\frac{1}{24}\)



4

\(\|\tau \|\)	\(\omega \)	s	Order condition
1			\(\sum _j b_{sj} = \frac{1}{2}\)
2			\(\sum _j b_{sj} = \frac{1}{2}\)
3
			\(\sum _j b_{sj} = \frac{1}{2}\)
			\(\sum _{j} b_{sj} -\sum _{j} b_{\bar{s}j} = 0\)
			\(\sum _j b_{sj} = -\frac{1}{24}\)
4






			\(\sum _{j} b_{sj} = \frac{1}{2}\)
			\(\sum _{j} b_{sj} -\sum _{j} b_{\bar{s}j} = 0\)
			\(\sum _{j} b_{sj} - \sum _{j} b_{\bar{s}j} = 0\)
			\(\sum _{j} b_{sj} = 0 \)

Springer Professional

Order theory for discrete gradient methods

Abstract

Publisher's Note

1 Energy preservation and discrete gradient methods

2 A preliminary analysis of discrete gradients

2.1 Properties of different discrete gradients

3 A generalization of the AVF method

3.1 Construction of higher order schemes

4 AVF discrete gradient methods for general skew-gradient systems

4.1 Skew-gradient systems and P-series

4.2 Order conditions

5 Order conditions for general discrete gradient methods

5.1 The constant S case

5.2 The general case

6 Numerical experiments

6.1 Hénon–Heiles system

6.2 Lotka–Volterra

6.3 Hamiltonian neural networks

7 Conclusions and future work

Acknowledgements

Publisher's Note

Premium Partner

\(\|\tau \|\)	\(\omega \)	Order condition
1		–
2	–	–
3		\(\sum _j b_{2j} = -\frac{1}{24}\)
4
5



		\(\sum _j b_{4j}=\frac{1}{240}\)
6

Springer Professional

Abstract

Publisher's Note

1 Energy preservation and discrete gradient methods

2 A preliminary analysis of discrete gradients

2.1 Properties of different discrete gradients

3 A generalization of the AVF method

3.1 Construction of higher order schemes

4 AVF discrete gradient methods for general skew-gradient systems

4.1 Skew-gradient systems and P-series

4.2 Order conditions

5 Order conditions for general discrete gradient methods

5.1 The constant S case

5.2 The general case

6 Numerical experiments

6.1 Hénon–Heiles system

6.2 Lotka–Volterra

6.3 Hamiltonian neural networks

7 Conclusions and future work

Acknowledgements

Publisher's Note

Weitere Artikel der Ausgabe 4/2022

A dimension expanded preconditioning technique for saddle point problems

On the quadrature exactness in hyperinterpolation

On a multiwavelet spectral element method for integral equation of a generalized Cauchy problem

Lawson schemes for highly oscillatory stochastic differential equations and conservation of invariants

An efficient adaptive multigrid method for the elasticity eigenvalue problem

Efficient numerical approximation of a non-regular Fokker–Planck equation associated with first-passage time distributions

Premium Partner