Top

Calcolo

Published in:

Open Access 01-06-2022

Relative error long-time behavior in matrix exponential approximations for numerical integration: the stiff situation

Author: S. Maset

Published in: Calcolo | Issue 2/2022

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Patentsearch

Off

Abstract

In the stiff situation, we consider the long-time behavior of the relative error $\gamma _n$ in the numerical integration of a linear ordinary differential equation $y^{\prime }(t)=Ay(t),\quad t\ge 0$, where A is a normal matrix. The numerical solution is obtained by using at any step an approximation of the matrix exponential, e.g. a polynomial or a rational approximation. We study the long-time behavior of $\gamma _n$ by comparing it to the relative error $\gamma _n^{\mathrm{long}}$ in the numerical integration of the long-time solution, i.e. the projection of the solution on the eigenspace of the rightmost eigenvalues. The error $ \gamma _n^{\mathrm{long}}$ grows linearly in time, it is small and it remains small in the long-time. We give a condition under which $\gamma _n\approx \gamma _n^{\mathrm{long}}$, i.e. $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\approx 1$, in the long-time. When this condition does not hold, the ratio $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}$ is large for all time. These results describe the long-time behavior of the relative error $\gamma _n$ in the stiff situation.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

Consider the ordinary differential equation (ODE)

$$\begin{aligned} \left\{ \begin{array}{l} y^{\prime }\left( t\right) =Ay\left( t\right) ,\ t\ge 0, \\ y\left( 0\right) =y_{0}, \end{array}\right. \end{aligned}$$

(1.1)

where $A\in {\mathbb {R}}^{d\times d}$ and $y\left( t\right) \in {\mathbb {R}}^{d}$ , and consider, over the mesh

$$\begin{aligned} t_{n}=nh,\ n=0,1,2,\ldots , \end{aligned}$$

of constant stepsize $h>0$, a numerical solution of (1.1) given by

$$\begin{aligned} y_{n}=R\left( hA\right) ^{n}y_{0},\quad n=0,1,2,\ldots , \end{aligned}$$

(1.2)

where $R:{\mathscr {D}}\subseteq {\mathbb {C}}\rightarrow {\mathbb {C}}$ is a analytic approximant of the exponential $\mathrm{e}^{z}$, $z\in {\mathbb {C}}$. When the numerical solution is obtained by a Runge–Kutta (RK) method, the approximant R is the stability function of the RK method and it is a polynomial or a rational function.

The paper [9] analyzed in the non-stiff situation the time behavior of the normwise relative error

$$\begin{aligned} \gamma _{n}=\frac{\left\| y_{n}-y\left( t_{n}\right) \right\| _{2}}{\left\| y\left( t_{n}\right) \right\| _{2}}, \ n=0,1,2,\ldots , \end{aligned}$$

(1.3)

in case of a normal matrix A. It seems to be the first paper in literature dealing in detail with the relative error time behavior of numerical solutions of ODEs. This is quite surprising because relative errors are generally considered better than absolute errors as quality measures of approximations. Indeed, componentwise relative errors are involved in the stepsize control mechanism (see [12]).

The present paper continues to analyze, in case of A normal, the error $\gamma _n$ by considering its long-time behavior in the stiff situation. Next subsection, with all its subsubsections, contains the basic material for facing such an analysis. Part of this material was introduced in [9].

1.1 Fundamental notations and notions

1.1.1 Small and large

We set that, for $a\ge 0$, “a is small” is the same as “$a\ll 1$ ” and “a is large” is the same as “$a\gg 1$”.

For $b\ge 0$ and $c>0$, $b\ll c$ means $\frac{b}{c}\ll 1$.

1.1.2 The notation $\approx $

For $a,b\in {\mathbb {R}}$, $a\approx b$ means

$$\begin{aligned} a=b\left( 1+e\right) \end{aligned}$$

with $\left| e\right| \ll 1$. We say $a\approx b$ with degree $\varepsilon $, where $\varepsilon >0$, if $\vert e\vert \le \varepsilon $.

Moreover, $a\lessapprox b$ means $a\le c$ and $c\approx b$ for some $c\in {\mathbb {R}}$.

For $a,b\in {\mathbb {R}}^d$, $a\approx b$ means

$$\begin{aligned} \frac{\Vert a-b\Vert _2}{\Vert b\Vert _2}\ll 1. \end{aligned}$$

We say $a\approx b$ with degree $\varepsilon $, where $\varepsilon >0$, if $\frac{\Vert a-b\Vert _2}{\Vert b\Vert _2}\le \varepsilon $.

1.1.3 The meaning of “it is expected”

In the paper, we often say “it is expected S”, where S is a statement, with the meaning that the statement not S is “unlikely” or “unusual” or “extreme”.

Sentences of this form can seem vague, although they are able to convey significant information. However, they are never used in definitions or theorems, which are stated in a precise manner without any such type of vagueness. The sentences are used for a better understanding of technical notions and results.

By introducing probability measures on data, we could made “it is expected S” mathematically precise, but this is out of the scope of the present paper.

1.1.4 The spectrum of A

The spectrum

$$\begin{aligned} \varLambda :=\{\lambda _{1},\lambda _{2},\ldots ,\lambda _{p}\} \end{aligned}$$

of the normal matrix A, where $\lambda _{1},\lambda _{2},\ldots , \lambda _{p}$ are the distinct eigenvalues of A, is partitioned by decreasing real part in the subsets $\varLambda _{1},\varLambda _{2},\ldots , \varLambda _{q}$ (see Fig. 1): we have

$$\begin{aligned}&\varLambda _{j}=\{\lambda _{i_{j-1}+1},\lambda _{i_{j-1}+2},\ldots , \lambda _{i_{j}}\} \\&\mathrm{Re}\left( \lambda _{i_{j-1}+1}\right) =\mathrm{Re} \left( \lambda _{i_{j-1}+2}\right) =\cdots =\mathrm{Re} \left( \lambda _{i_{j}}\right) =r_{j}\\&j=1,2,\ldots ,q, \end{aligned}$$

with $0=i_{0}<i_{1}<\cdots <i_{q}=p$, and

$$\begin{aligned} r_{1}>r_{2}>\cdots >r_{q}. \end{aligned}$$

For $i=1,\ldots ,p$, let $P_{i}$ be the orthogonal projection on the eigenspace of $\lambda _{i}$. For $j=1,\ldots ,q $, let

$$\begin{aligned} Q_{j}:=\sum \limits _{\lambda _i\in \varLambda _j}P_{i}. \end{aligned}$$

For a nonempty subset $\varGamma $ of $\varLambda $, let

$$\begin{aligned} \rho _{\varGamma }:=\max _{\lambda _{i}\in \varGamma } \left| \lambda _{i}\right| \ \ \text {and}\ \ \mu _{\varGamma }:=\min \limits _{\lambda _{i}\in \varGamma } \left| \lambda _{i}\right| . \end{aligned}$$

(1.4)

1.1.5 The initial value $y_0$

We assume $y_0\ne 0$. Thus $y(t)=\mathrm{e}^{tA}y_0\ne 0$, for any t, and the relative error $\gamma _n$ is defined for any n. Let

$$\begin{aligned} \widehat{y}_0:=\frac{y_0}{\Vert y_0\Vert _2} \end{aligned}$$

be the normalized initial value.

Let

$$\begin{aligned} \varLambda ^{*}:= & {} \left\{ \lambda _i\in \varLambda :P_{i}y_{0} \ne 0\right\} \\ \varLambda _j^{*}:= & {} \varLambda _j\cap \varLambda ^*,\ j=1,\ldots ,q. \end{aligned}$$

The generic situation for the initial value $y_0$ is $\varLambda ^*=\varLambda $. In order to use simpler notations, we assume this generic situation.

If it does not hold, then below we have to see $\varLambda _1,\ldots ,\varLambda _q$ as $\varLambda _{1}^*,\ldots ,\varLambda _q^*$ without the sets $\varLambda ^{*}_j$ that are empty. In other words, we see $\varLambda _1$ as $\varLambda _{j_1^*}$ where

$$\begin{aligned} j_1^*:=\min \{j\in \{1,\ldots ,q\}:\varLambda _j^*\ne \emptyset \}, \end{aligned}$$

$\varLambda _2$ as $\varLambda _{j_2^*}$ where

$$\begin{aligned} j_2^*:=\min \{j\in \{j_1^*+1,\ldots ,q\}:\varLambda _j^*\ne \emptyset \}, \end{aligned}$$

and so on. Of course, when we do this, the number q of sets in $\varLambda _1,\ldots ,\varLambda _q$ is no longer equal to the number of possible real parts in the spectrum $\varLambda $, but it is equal to the number of possible real parts in $\varLambda ^*$.

1.1.6 Rightmost and non-rightmost eigenvalues

The set $\varLambda _1$ is the set of the rightmost eigenvalues. The set

$$\begin{aligned} \varLambda ^{-}:=\varLambda {\setminus }\varLambda _{1}=\bigcup \limits _{j=2}^q \varLambda _j \end{aligned}$$

is the set of the non-rightmost eigenvalues. We assume $q>1$ in order to have $\varLambda ^{-}\ne \emptyset $.

By recalling the definitions (1.4), we set

$$\begin{aligned} \rho:= & {} \rho _{\varLambda }\ \ \text {and}\ \ \mu :=\mu _{\varLambda }\nonumber \\ \rho _{1}:= & {} \rho _{\varLambda _{1}}\ \ \text {and}\ \ \mu _{1} :=\mu _{\varLambda _{1}}\nonumber \\ \rho ^{-}:= & {} \rho _{\varLambda ^{-}}\ \ \text {and}\ \ \mu ^{-} :=\mu _{\varLambda ^{-}}. \end{aligned}$$

(1.5)

1.1.7 The numbers $\beta _j$

For $j=2,\ldots ,q$, i.e. for any non-rightmost real part, let

$$\begin{aligned} \beta _{j}:=\frac{r_{j}-r_{1}}{\rho _1 }. \end{aligned}$$

Observe that

$$\begin{aligned} 0>\beta _2>\cdots >\beta _q. \end{aligned}$$

It is expected $\vert \beta _2\vert $ non-small.

1.1.8 Dimensionless quantities

We use the dimensionless stepsize $h\rho _1$, or $h\rho $, and the dimensionless time $t\rho _1$, or $t\rho $, rather than the stepsize h and the time t, respectively, because they are small or large independently of the unit used for time.

In this paper, when we say that a certain quantity is small or large, this quantity is always dimensionless.

The numbers $\beta _j$ defined above are dimensionless, as well as the errors $\sigma _i$ now introduced.

1.1.9 The errors $\sigma _i$

We assume that the approximant R has order l, where l is a positive integer. This means

$$\begin{aligned} R\left( z\right) -\mathrm{e}^{z}=Cz^{l+1} +O\left( z^{l+2}\right) ,\ z\rightarrow 0, \end{aligned}$$

(1.6)

with $C\ne 0$. It is assumed that the domain ${\mathscr {D}}$ of R includes a neighborhood of zero. Moreover, we assume $h\lambda _i\in {\mathscr {D}}$, $i=1,\ldots ,p$.

We introduce the complex numbers

$$\begin{aligned} \sigma _{i}:=\log S\left( h\lambda _{i}\right) ,\ i=1,\ldots ,p, \end{aligned}$$

(1.7)

where

$$\begin{aligned} S(z)=\mathrm{e}^{-z}R(z),\ z\in {\mathscr {D}}, \end{aligned}$$

(1.8)

is the relative approximant. The numbers $\sigma _i$ are logarithmic errors of R as an approximant of the exponential, since

$$\begin{aligned} \sigma _{i}=\log R\left( h\lambda _{i}\right) -\log \mathrm{e}^{h\lambda _i},\ i=1,\ldots ,p. \end{aligned}$$

For a nonempty subset $\varGamma $ of $\varLambda $, we have

$$\begin{aligned} \max _{\lambda _{i}\in \varGamma }\left| \sigma _{i}\right|= & {} \vert C\vert \left( h\rho _{\varGamma }\right) ^{l+1} \left( 1+O\left( h\rho _{\varGamma }\right) \right) \nonumber \\ \min \limits _{\lambda _{i}\in \varGamma }\left| \sigma _{i}\right|= & {} \vert C\vert \left( h\mu _{\varGamma }\right) ^{l+1} \left( 1+O\left( h\rho _{\varGamma }\right) \right) . \end{aligned}$$

(1.9)

as $h\rho _{\varGamma }\rightarrow 0$, where $\rho _{\varGamma }$ and $\mu _{\varGamma }$ are defined in (1.4).

1.1.10 Local relative errors and global relative errors

As particular cases of (1.9), we obtain

$$\begin{aligned} \max \limits _{\lambda _{i}\in \varLambda _{1}}\left| \sigma _{i}\right| = |C|\left( h\rho _1 \right) ^{l+1} \left( 1+O\left( h\rho _1 \right) \right) ,\ h\rho _1 \rightarrow 0, \end{aligned}$$

(1.10)

and

$$\begin{aligned} \max _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| =\vert C\vert \left( h\rho \right) ^{l+1}\left( 1+O\left( h\rho \right) \right) , h\rho \rightarrow 0. \end{aligned}$$

(1.11)

We introduce

$$\begin{aligned} E_{1}:=\frac{\max \limits _{\lambda _{i}\in \varLambda _1}|\sigma _{i}|}{h\rho _1} \end{aligned}$$

and

$$\begin{aligned} E:=\frac{\max \limits _{\lambda _{i}\in \varLambda }|\sigma _{i}|}{h\rho } \end{aligned}$$

We can consider $\max \nolimits _{\lambda _{i}\in \varLambda _{1}}\left| \sigma _{i}\right| $ and $\max \nolimits _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| $ as local relative errors, and $E_1$ and E as global relative errors, of the numerical integration. An explanation for this is given below at points 2 of Remarks 1.1 and 1.2.

1.1.11 The ratios $K_1$ and K

When $0\notin \varLambda _{1}$, let

$$\begin{aligned} K_1:=\frac{\max \nolimits _{\lambda _{i}\in \varLambda _{1}} \left| \sigma _{i}\right| }{\min \nolimits _{\lambda _{i}\in \varLambda _{1}}\left| \sigma _{i}\right| } =\left( \frac{\rho _1 }{\mu _1 }\right) ^{l+1} \left( 1+O\left( h\rho _1 \right) \right) ,\ h\rho _1\rightarrow 0, \end{aligned}$$

(1.12)

The right-hand side follows by (1.9). Observe that the generic situation for the matrix A is to have $\varLambda _1$ constituted by a real eigenvalue or by a unique pair of complex conjugate eigenvalues. In this generic situation, we have $K_{1}=1$.

When $0\notin \varLambda $, let

$$\begin{aligned} K:=\frac{\max \nolimits _{\lambda _{i}\in \varLambda } \left| \sigma _{i}\right| }{\min \nolimits _{\lambda _{i} \in \varLambda }\left| \sigma _{i}\right| } =\left( \frac{\rho }{\mu }\right) ^{l+1} \left( 1+O\left( h\rho \right) \right) ,\ h\rho \rightarrow 0. \end{aligned}$$

1.1.12 The ratios $M_i$ and M

For $\lambda _i\in \varLambda ^{-}$, i.e. $\lambda _i$ is a non-rightmost eigenvalue, let

$$\begin{aligned} M_i:=\frac{\left| \sigma _{i}\right| }{\max \nolimits _{\lambda _k\in \varLambda _{1} }\left| \sigma _{k}\right| }=\left( \frac{\vert \lambda _i\vert }{\rho _1} \right) ^{l+1}\left( 1+O\left( h\rho \right) \right) ,\ h\rho \rightarrow 0. \end{aligned}$$

(1.13)

Moreover, let

$$\begin{aligned} M :=\max _{\lambda _i\in \varLambda ^{-}}M_{i}=\frac{\max \nolimits _{\lambda _i\in \varLambda ^{-}}\left| \sigma _{i}\right| }{\max \nolimits _{\lambda _k\in \varLambda _{1} }\left| \sigma _{k}\right| }=\left( \frac{\rho ^{-}}{\rho _1 }\right) ^{l+1}\left( 1+O\left( h\rho \right) \right) ,\ h\rho \rightarrow 0. \end{aligned}$$

(1.14)

1.1.13 The base situation

We call base situation the situation where $\max \nolimits _{\lambda _{i}\in \varLambda _{1}}\left| \sigma _{i}\right| $ is small.

Here are some observations about the base situation.

In the base situation, it is expected $E_1$ small, i.e. $\max \nolimits _{\lambda _{i}\in \varLambda _{1}}\left| \sigma _{i}\right| \ll h\rho _1$, and $h\rho _1$ non-large. Look at (1.10).
We do not say that in the base situation it is expected $h\rho _1$ small. In fact, we do not see the case where $\max \nolimits _{\lambda _{i}\in \varLambda _{1}}\left| \sigma _{i}\right| $ is small and $h\rho _1$ is not small as “unusual”, when R is an high order approximant.

1.1.14 The non-stiff situation and the stiff situation

The base situation is partitioned in two disjoint sub-situations: the non-stiff situation and the stiff situation.

We call non-stiff situation (stiff situation) the sub-situation of the base situation where $\max \nolimits _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| $ is small ( $\max \nolimits _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| $ is not small), equivalently $\max \nolimits _{\lambda _{i}\in \varLambda ^{-}}\left| \sigma _{i}\right| $ is small ($\max \nolimits _{\lambda _{i}\in \varLambda ^{-}}\left| \sigma _{i}\right| $ is not small).

The non-stiff situation and the stiff situation correspond to what is meant as non-stiff and stiff in the traditional terminology of numerical ODEs. The explanation is given below at point 3 of Remark 1.2.

Here are some observations about the non-stiff and stiff situations.

In the non-stiff situation, it is expected E small, i.e. $\max \nolimits _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| \ll h\rho $, and $h\rho $ non-large. Look at (1.11).
In the non-stiff situation, it is expected
$$\begin{aligned} \frac{\max \nolimits _{\lambda _{i}\in \varLambda ^{-}}| \sigma _{i}|}{h\rho _1}=ME_1 \end{aligned}$$

(1.15)
small. In fact, it is expected $E_1$ small and then to have both $\max \nolimits _{\lambda _{i}\in \varLambda ^{-}}|\sigma _{i}|$ and $\max \nolimits _{\lambda _{i}\in \varLambda _1}|\sigma _{i}|$ small with their ratio M not satisfying $M\ll \frac{1}{E_1}$ appears to be an “extreme” case.
In the stiff situation, it is expected $h\rho $ non-small. In fact, to have $\max \nolimits _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| $ non-small with $h\rho $ small appear to be “unlikely”.
In the stiff situation, M is large since it the ratio between a non-small number and a small number.

1.1.15 The function g

Let

$$\begin{aligned} g\left( c\right) :=\frac{\mathrm{e}^{c}-1-c}{c},\ c\ge 0. \end{aligned}$$

The function g is increasing with $g(0)=0$. We have $g(c)\approx \frac{c}{2}$ for c small, $g(1)=0.71828$ and $g(c)=1$ for $c=1.2564$.

1.2 Analysis of the error $\gamma _n$

After having introduced the basic material in the previous subsection, we can proceed with our analysis of the error $\gamma _n$.

Next theorem (it is Theorem 4.1 in [9] stated with E instead of $\max \nolimits _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| $) describes how the error $\gamma _{n}$ grows in time.

Theorem 1.1

Assume $0\notin \varLambda $. Fix $c>0$. For $t_n\rho \le \frac{c}{E}$, we have

$$\begin{aligned} \frac{t_n\rho E}{K}\left( 1-g(c)\right) \le \gamma _{n}\le t_n\rho E\left( 1+g(c)\right) . \end{aligned}$$

The theorem with $c=1$ reads

$$\begin{aligned} 0.28172\cdot \frac{t_n\rho E}{K} \le \gamma _{n}\le 1.7183\cdot t_n\rho E \end{aligned}$$

(1.16)

for $t_n\rho \le \frac{1}{E}$.

If $E\ll 1$, then (1.16) says that $\gamma _{n}$ is small and grows linearly in time up to large times $t_n\rho $, precisely up to the large time $\frac{1}{E}$. This result is useful in the non-stiff situation, where it is expected $E\ll 1$.

Remark 1.1

By taking a small c in the previous theorem, we have

$$\begin{aligned} \frac{t_n\rho E}{K}\lessapprox \gamma _{n}\lessapprox t_n\rho E. \end{aligned}$$

To be more precise, this holds for times $t_n\rho \le x$, where $x>0$ is such that $xE\ll 1$.

After one step ($n=1$), we have

$$\begin{aligned} \frac{\max \nolimits _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| }{K}\left( 1-g(c)\right) \le \gamma _{1}\le \max \limits _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| \left( 1+g(c)\right) . \end{aligned}$$

This explains because $\max \limits _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| $ can be considered as local relative error in the numerical integration of the solution. At $t_n\rho =1$, we have

$$\begin{aligned} \frac{E}{K}\left( 1-g(c)\right) \le \gamma _{n}\le E\left( 1+g(c)\right) . \end{aligned}$$

This explains because E can be considered as global relative error in the numerical integration of the solution.

The theorem assumes $0\notin \varLambda $. If $\varLambda =\{0\}$, we have $\gamma _n=0$ for any n. For the case $0\in \varLambda $ and $\varLambda \ne \{0\}$, see point 5 of Remark 4.1 in [9].

1.2.1 The long-time solution

Let $y^{\mathrm{long}}$ be the solution of (1.1) with initial value $ Q_{1}y_{0}$ instead of $y_0$.

The solution $y^{\mathrm{long}}$ is the long-time solution of (1.1), since we have $y(t)\approx y^{\mathrm{long}}(t)$ for $t\rho _1$ large. In particular, we have $y(t)\approx y^{\mathrm{long}} (t)$ with degree $\varepsilon $, where $\varepsilon >0$, if

$$\begin{aligned} \sqrt{\sum \limits _{j=2}^{q}\left( \mathrm{e}^{\left( r_j-r_{1}\right) t} \frac{\left\| Q_{j}\widehat{y}_{0}\right\| _{2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^{2}}=\sqrt{\sum \limits _{j=2}^{q}\left( \mathrm{e}^{\beta _j t\rho _1}\frac{\left\| Q_{j}\widehat{y}_{0}\right\| _{2}}{\left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^{2}} \le \varepsilon \end{aligned}$$

(1.17)

(see Theorem 5.1 in [9]). Observe that the left-hand side of (1.17) goes to zero as $t\rho _1\rightarrow +\infty $.

1.2.2 The error $\gamma _n^{\mathrm{long}}$

Let $\gamma _{n}^{\mathrm{long}}$ be the error $\gamma _n$ of the long-time solution $y^{\mathrm{long}}$. Next theorem (it is Theorem 5.2 in [9] stated with $E_1$ instead of $\max \nolimits _{\lambda _{i}\in \varLambda _1}\left| \sigma _{i}\right| $) describes how the error $\gamma _{n}^{\mathrm{long}}$ grows in time.

Theorem 1.2

Assume $0\notin \varLambda _{1}$. Fix $c>0$. For $t_n\rho _1\le \frac{c}{E_1}$, we have

$$\begin{aligned} \frac{t_n\rho _1 E_1}{K_1}\left( 1-g(c)\right) \le \gamma _{n}^{\mathrm{long}}\le t_n\rho _1 E_1\left( 1+g(c)\right) . \end{aligned}$$

The theorem with $c=1$ reads

$$\begin{aligned} 0.28172\cdot \frac{t_n\rho _1 E_1}{K_1} \le \gamma _{n}^{\mathrm{long}} \le 0.28172\cdot t_n\rho _1 E_1 \end{aligned}$$

(1.18)

for $t_n\rho \le \frac{1}{E_1}$.

If $E_1\ll 1$, then (1.18) says that $\gamma _{n}^{\mathrm{long}}$ is small and grows linearly in time up to large times $t_n\rho _1$, precisely up to the large time $\frac{1}{E_1}$. This result is useful in the base situation, where it is expected $E_1\ll 1$.

Remark 1.2

By taking a small c in the previous theorem, we have

$$\begin{aligned} \frac{t_n\rho _1 E_1}{K_1}\lessapprox \gamma _{n}\lessapprox t_n\rho _1 E_1. \end{aligned}$$

This holds for times $t_n\rho _1\le x$, where $x>0$ is such that $xE_1\ll 1$. If $\varLambda _{1}$ is constituted by a real eigenvalue or by a complex conjugate pair of eigenvalues (the generic situation for the matrix A), we have $K_1=1$ and then

$$\begin{aligned} \gamma _{n}^{\mathrm{long}}\approx t_n\rho _1 E_1. \end{aligned}$$

(1.19)

Similarly to the point 1 of Remark 1.1, we can explain because $\max \nolimits _{\lambda _{i}\in \varLambda _{1}}\left| \sigma _{i}\right| $ and $E_1$ can be considered as local relative error and global relative error, respectively, in the numerical integration of the long-time solution.

Since $\max \nolimits _{\lambda _{i}\in \varLambda }\left| \sigma _{i}\right| $ and $\max \nolimits _{\lambda _{i}\in \varLambda _{1}}\left| \sigma _{i}\right| $ can be considered as local relative errors in the numerical integration of the solution and the long-time solution, respectively, we can say that in the non-stiff situation the local relative error of the solution is small, whereas in the stiff situation the local relative error of the solution is not small, but the local relative error of the long-time solution is small. This agrees with the traditional concepts of non-stiff and stiff.

The theorem assumes $0\notin \varLambda _1$. If $\varLambda _1=\{0\}$, we have $\gamma _n^{\mathrm{long}}=0$ for any n. For the case $0\in \varLambda _1$ and $\varLambda _1\ne \{0\}$, see point 5 of Remark 5.2 in [9].

1.2.3 Long-time behavior of $\gamma _n$

We want to study the long-time behavior of the error $\gamma _n$. This is done by comparing it to the error $\gamma _{n}^{\mathrm{long}}$.

Since in the long-time the solution y becomes the solution $y^{\mathrm{long}}$ whose error $\gamma _n$ is just $\gamma _{n}^{\mathrm{long}}$, it is quite reasonable to have $\gamma _{n}\approx \gamma _{n}^{\mathrm{long}}$ in the long-time.

Indeed, at point 4 of Remark 5.3 in [9], it is stated the following result.

Theorem 1.3

Assume $q>1$ and $0\notin \varLambda _1$. Fix $c>0$ such that $g(c)<1$, i.e. $c<1.2564$. For any $\varepsilon >0$, there exist $H_0>0$ (independent of $\varepsilon $) and $s\ge 0$ (dependent on $\varepsilon $) such that, for $h\rho \le H_0$ and $s\le t_n\rho \le \frac{c}{E}$, we have $\gamma _{n}\approx \gamma _{n}^{\mathrm{long}}$ with degree $\varepsilon $.

Remark 1.3

The theorem assumes $q>1$. If $q=1$, then $\gamma _{n}=\gamma _{n}^{\mathrm{long}}$ for any n. In addition, it assumes $0\notin \varLambda _1$. If $q>1$ and $\varLambda _1=\left\{ 0\right\} $, then $\gamma _{n}^{\mathrm{long}}=0$ for any n and it does not make sense look at $\gamma _{n}\approx \gamma _{n}^{\mathrm{long}}$, since this implies $\gamma _{n}=0$. For the case $q>1$, $0\in \varLambda _1$ and $\varLambda _1\ne \left\{ 0\right\} $, see point 6 of Remark 5.3 in [9].

The previous theorem is of interest in the non-stiff situation, where the condition $h\rho \le H_0$ is not restrictive. In fact, in the non-stiff situation it is expected $h\rho $ non-large.

On the other hand, the result is not useful in the stiff situation, since the condition $h\rho \le H_0$ is restrictive. In fact, in the stiff situation it is expected $h\rho $ non-small.

1.3 The contents of this paper

The present paper wants to study the long-time behavior of the relative error $\gamma _n$ in the stiff situation. As above, this is done by comparing it to $\gamma _{n}^{\mathrm{long}}$.

In the stiff situation, it is important to have $\gamma _{n}\approx \gamma _{n}^{\mathrm{long}}$ in the long-time. In fact, if this happens, since $\gamma _{n}^{\mathrm{long}}$ is small up to large times $t_n\rho _1$, we have the very surprising fact that the error $\gamma _n$ is small in the long-time, although the stepsize h is tuned only for having a small local relative error of the long-time solution and, because of this, the local relative error of the solution is not small.

In other words, when we are interested in the numerical integration of the solution in the long-time, we can start from the beginning with a stepsize suitable for integrating with a small local relative error the long-time solution, larger than the stepsize suitable for integrating with a small local relative error the solution, and in the long-time we will have a small error $\gamma _n$.

As in [9], we confine our attention to normal matrices. This should be not considered as a limitation, since the class of the normal matrices is sufficiently large to include important types of matrices and, moreover, the test problem (1.1) with A normal shows unexplored and interesting situations in numerical ODEs.

The plan of the paper is as follows.

Section 2 shows two examples of stiff situation where we can fail to get $\gamma _{n}\approx \gamma _{n}^{\mathrm{long}}$ in the long-time with $\gamma _n$ non-small and growing unboundedly.
Section 3 introduces the definition of “$\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time” of our interest.
Section 4 gives the condition for having, in the stiff situation, $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time.
Section 5 show that when this condition does not hold, we have, in the stiff situation, $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time.
Section 6 revises the examples of Sect. 2 in the light of the results of Sects. 4 and 5.
Section 7 studies when the condition for having $\gamma _n\approx \gamma _n^{\mathrm{long}}$ holds independently of the specific non-rightmost spectrum.
Conclusions are draft in Sect. 8.

1.4 Replies to general questions or criticisms

This final subsection includes replies to general questions or criticisms which could be issued about the contents of this paper.

Question. What is the motivation of this paper?

Reply. This paper studies the relative error of numerical approximations of ODEs, although confined to linear systems with normal matrix. Of course, the absolute error and the relative error of the numerical approximations have the same order of convergence with respect to the stepsize h, but they have a different time behavior in the numerical integration of a solution spanning over various orders of magnitude.
The motivation for studying the relative error time behavior in numerical ODEs, as the present paper is doing, comes from the following two facts:
- as it is widely recognized, the relative error is an important measure for the quality of an approximation, often better than the absolute error;
- there has been no attention in the numerical ODEs community about the relative error time behavior of numerical approximations.
Anyway, the fact that in the numerical ODEs field the relative error is considered important is attested by the numerical solvers, which accept as an input argument a tolerance on the componentwise relative error. Thus, this paper (similarly to [9] and [10]) try to fill this gap between theory, where there are not studies on the relative error, and practice, where the relative error is used.
Question. What is the relevance of the results achieved?
Reply. For the numerical ODEs community, it should be of interest to know the relative error time behavior of numerical approximations of the ODE (1.1) with A normal. The results achieved describes this time behavior and their relevance is that they give a new prospective on the numerical integration errors. We can summarize this new perspective in the following points.
- In the non-stiff situation, the relative error is small and it grows linearly in time. Moreover, this linear growth is determined in the long-time only by the rightmost eigenvalues.
- In the stiff situation, the relative error is not small at the beginning of the numerical integration and it is not guaranteed that in the long-time it will become small, with a linear growth determined only by the rightmost eigenvalues. This happens if and only if a certain condition is satisfied and this condition is a novelty in the numerical ODE theory.
- Gauss RK methods, despite they are considered stable in the classic numerical linear stability theory (they are A-stable methods), are not suitable to have the above condition satisfied. On the other hand, Radau and Lobatto IIIC RK methods are suitable to have this condition satisfied.
Criticism. Componentwise relative errors
$$\begin{aligned} \frac{\vert y_{n,i}-y_i(t_n)\vert }{\vert y_i(t_n)\vert },\ i=1,\ldots ,d, \end{aligned}$$
where $y_{n,i}$ and $y_i(t_n)$, $i=1,\ldots ,d$, are the components of $y_n$ and $y(t_n)$, should be considered (as in the numerical ODE solvers), not the normwise relative error (1.3).

Reply. In literature both normwise relative errors and componentwise relative errors are considered as quality measures of vector approximations (see [2]). The componentwise approach has the advantage that it gives information on the precision of the components, but it has the drawback that the components must be nonzero (when some component becomes zero, we need to switch to the absolute error). On the other hand, the normwise approach can give anyway information about the componentwise relative errors (for example, a large normwise relative error implies that some component has a large relative error) and it works also when some component becomes zero.
Criticism. Relative errors should be not considered in situations where the exact solution approaches zero, as those studied in this paper. A rule of thumb in numerical analysis says that one should switch to the absolute error in this situation.

Reply. In mathematical modeling and numerical analysis there is a threshold in the order of magnitude of quantities (scalars or vectors) under which they are considered zero. Under the threshold, it is important to use the absolute error for approximations, since they are considered approximations of zero. But, in case of a solution of (1.1) which is going to zero, and so it is spanning over several orders of magnitude, it could be of interest to compute with a good precision this solution for the orders of magnitude larger than the threshold. In this situation, the relative error is important.

Of course, the numerical analyst’s point of view is that the threshold is the order of magnitude of the machine epsilon, but in applications this threshold can be larger.

As an example, we can consider the radioactivity decay of radionuclides, where the activity a(t) (measured in becquerel (Bq) by a Geiger counter) of a given amount of radionuclide satisfies $a^{\prime }(t)=-\lambda a(t)$ with $\lambda >0$. For a decay chain, we have $a^{\prime }(t)=A a(t)$, where A is a lower bi-diagonal matrix, the so-called Bateman equation. The threshold could be the order of magnitude $10^2 \mathrm{Bq/kg}$ of the background radiation. Of course, this threshold becomes a much smaller ten power by using an unit larger than the becquerel, e.g. the curie. It could be interesting to numerically compute with a good precision a solution a(t) whose initial value has order of magnitude $10^6 \mathrm{Bq/kg}$ (like in a nuclear plant accident). Since the solution becomes small compared to the initial value, using the relative error for the approximations of the solution is better than using the absolute error when the solution is not yet considered as zero.

Another example could be a space discretization of the heat equation, with homogeneous Dirichlet boundary condition, by the method of lines. In this case, the space discrete temperature approaches zero (the border temperature) and under a given threshold in the order of magnitude, say $10^{-2}\ ^\circ \mathrm{C}$, it can be considered zero. But, over this order of magnitude, the temperature is not zero and it becomes important to use the relative error for time-space approximations, especially when the solution spans over several orders of magnitude due to an initial value with order of magnitude larger than the threshold, for example $10^2\ ^\circ \mathrm{C}$.

We remark that the analysis in this paper also consider the situation where the solution, instead of approaching to zero, grows up to large values with respect to the initial value. Also in this situation the relative error is important.
Criticism. The paper considers ODEs (1.1) with matrix A normal. Such problems can be diagonalized with a unitary transformation and then one can assume without loss of generality that A is diagonal.

Reply. In the paper, we do not assume from the beginning that A is diagonal because this does not simplify the exposition. In fact, the analysis presented starts from the fundamental relation (4.6) given below for the relative error, which maintains the same form when A is diagonal. We have such a net expression for the relative error precisely for the possibility to reduce to the diagonal case by a unitary transformation. Hence, the assumption that A is diagonal is already implicitly done when one decides to deal with a normal matrix.
Criticism. Since it is possible to reduce to the diagonal case, it would be sufficient to study the behavior of the numerical scheme at a scalar problem, which is really trivial.

Reply. Although we can reduce to a linear systems of uncoupled scalar differential equations, this does not mean that they are fully uncoupled in the numerical scheme, since we are using the same stepsize h in all scalar equations. This reflects the fact that the numerical scheme is applied to an ODE (1.1) with a matrix A in general non-diagonal, without thinking to diagonalize it in advance. Moreover, the analysis of the present paper requires to have rightmost and non-rightmost eigenvalues. In other words, we need eigenvalues with different real parts, i.e. an ODE (1.1) with different time scales. The case of a sole scalar equation is not considered. Anyway, we can observe that in the base situation for a scalar equation, the relative error $\gamma _n=\gamma _n^{\mathrm{long}}$ is expected to be small and linearly growing in time up to large times.

2 Examples

In this section, we give two examples of stiff situations where the error $\gamma _{n}$ is not small from the beginning of the numerical integration and it grows without to approach in the long-time to the small error $\gamma _{n}^{\mathrm{long}}$.

We remind that the stability region of the approximant R (see [5]) is the set

$$\begin{aligned} {\mathscr {R}}:=\{z\in {\mathscr {D}}:|R(z)|\le 1\} \end{aligned}$$

and the order star of R (see [5‐7, 13]) is the set

$$\begin{aligned} {\mathscr {S}}:=\left\{ z\in {\mathscr {D}}:\left| S\left( z\right) \right| >1\right\} , \end{aligned}$$

where S is the relative approximant given in (1.8). The complementary set of ${\mathscr {S}}$ is

$$\begin{aligned} {\mathscr {S}}^{c}={\mathscr {D}}{\setminus } {\mathscr {S}}=\left\{ z\in {\mathscr {D}} :\left| S\left( z\right) \right| \le 1\right\} . \end{aligned}$$

2.1 Same approximant with different ODEs

As first example, we consider the ODE (1.1) with the symmetric matrix

$$\begin{aligned} A=\frac{1}{2}\left[ \begin{array}{cc} a+b &{} a-b \\ a-b &{} a+b \end{array}\right] , \end{aligned}$$

whose eigenvalues are a and b with relevant eigenvectors $\left( 1,1\right) $ and $\left( 1,-1\right) $, respectively. We consider $a=-1$ and the following three possibilities for b:

(P1)

$b=-11$;

(P2)

$b=-13.5$;

(P3)

$b=-16$.

The initial value is $y_{0}=\left( 2,-1\right) $, for which we have

$$\begin{aligned} \Vert P_1\widehat{y}_0\Vert _2=\frac{1}{\sqrt{10}} \ \ \text {and}\ \ \Vert P_2\widehat{y}_0\Vert _2=\frac{3}{\sqrt{10}}. \end{aligned}$$

The solution y quickly approaches to the long-time solution $y^{\mathrm{long} }$: we have $y(t)\approx y^{\mathrm{long}}(t)$ if

$$\begin{aligned} \mathrm{e}^{\left( b-a\right) t}\frac{\left\| P_{2}\widehat{y}_{0} \right\| _{2}}{\left\| P_{1}\widehat{y}_{0}\right\| _{2}} =3\mathrm{e}^{(b+1)t}\ll 1 \end{aligned}$$

(look at (1.17)).

For the numerical integration, we use the Taylor approximant of order five

$$\begin{aligned} R(z)=1+z+\frac{z^{2}}{2}+\frac{z^{3}}{6}+\frac{z^{4}}{24} +\frac{z^{5}}{120},\ z\in {\mathbb {C}}, \end{aligned}$$

with stepsize $h=\frac{1}{5}$ over $N=25$ steps up to $t_N=Nh=5$.

We have:

$$\begin{aligned} h\rho _1=0.2,\ \vert \sigma _1\vert =1.06\cdot 10^{-7}\ \text {and} \ E_1=\frac{\vert \sigma _1\vert }{h\rho _1}=5.28\cdot 10^{-7}. \end{aligned}$$

We are in the stiff situation: in the three possibilities for b, we have

$$\begin{aligned} \begin{array}{|l|l|} \hline &{} \vert \sigma _{2}\vert \\ \hline \hbox {P}1) &{} 4.09 \\ \hline \hbox {P}2) &{} 3.50 \\ \hline \hbox {P}3) &{} 4.46 \\ \hline \end{array} \end{aligned}$$

Since

$$\begin{aligned} t_N\rho _1 E_1=2.64\cdot 10^{-6}\ll 1, \end{aligned}$$

by (1.19) we obtain

$$\begin{aligned} \gamma _{n}^{\mathrm{long}}\approx t_n\rho _1 E_1=t_n \cdot 5.28\cdot 10^{-7} \end{aligned}$$

for $t_n\le t_N$.

For the possibility P1), we see in Fig. 2, for $n=0,1,2,\ldots ,N$, the relative errors $\gamma _{n}$ (solid red line) and $\gamma _n^{\mathrm{long}}$ (dash blue line).

Starting from a non-small $\gamma _1$ (remind that $\gamma _0=0$), the error $\gamma _{n}$ goes down to the small error $\gamma _n^{\mathrm{long}}$. In the long-time, we have small errors $\gamma _{n}$ although the stepsize is tuned only for having a small $\sigma _{1}$, without any concern about $\sigma _{2}$.

For the possibility P2), we see in Fig. 3 the same as in Fig. 2. As in P1), starting from a non-small $\gamma _1$, the error $\gamma _n$ goes down to $\gamma _n^{\mathrm{long}}$, although $\gamma _n^{\mathrm{long}}$ is reached at a larger time with respect to P1).

Finally, for the possibility P3), we see in Fig. 4 the same as in Figs. 2 and 3. Unlike P1) and P2), the error $\gamma _{n}$ does not go down to $\gamma _n^{\mathrm{long}}$, but it continues to grow.

2.1.1 Order star and stability region

Fixed $a=-1$, we are interested in understanding for which b, with $b<a$, we have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time. This happens in P1) and P2), but not in P3).

Order star and stability region for the Taylor approximant of order five are depicted in Fig. 5.

The values of $\left| S\left( hb\right) \right| $ and $\left| R\left( hb\right) \right| $ are:

$$\begin{aligned} \begin{array}{|l|l|l|} \hline &{} \left| S\left( hb\right) \right| &{} \left| R\left( hb\right) \right| \\ \hline \hbox {P}1) &{} 0.0728 &{} 0.00807 \\ \hline \hbox {P}2) &{} 4.72 &{} 0.317 \\ \hline \hbox {P}3) &{} 23.8 &{} 0.968 \\ \hline \end{array} \end{aligned}$$

Observe that $hb\in {\mathscr {R}}$ for all three possibilities and $hb\in {\mathscr {S}}^{c}$ only in P1). In other words, by looking at the negative real axis of Fig 5, hb lies in the red region for all three possibilities and hb lies in the white finger only in P1).

In the next Sect. 4, we will see a condition on hb for having $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time. When it does not hold, we have $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time. The condition is something between $hb\in {\mathscr {S}}^c$ (i.e. to stay in the white finger) and $hb\in {\mathscr {R}}$ (i.e. to stay in the red region). Indeed, to have $hb\in {\mathscr {S}}^c$ is sufficient, but not necessary, for this condition on hb and to have $hb\in {\mathscr {R}}$ is necessary, but not sufficient.

2.2 Same ODE with different approximants

As second example, we consider the ODE (1.1) with the normal matrix

$$\begin{aligned} A=QDQ^{H} \end{aligned}$$

with

$$\begin{aligned} D=\mathrm{diag}\left( \lambda _1,\overline{\lambda _1},\lambda _3, \overline{\lambda _3}\right) =\mathrm{diag} \left( -1+i,-1-i,-3+1000i,-3-1000i\right) \end{aligned}$$

and Q with orthonormal columns $u_1,\overline{u_1},u_3,\overline{u_3}$, where

$$\begin{aligned} u_1= & {} v_1+iv_2,\ u_3=v_3+iv_4, \\ v_1= & {} \frac{1}{2\sqrt{2}}\left( 1,1,1,1\right) , \quad v_2=\frac{1}{2\sqrt{2}}\left( 1,1,-1,-1\right) , \\ v_3= & {} \frac{1}{2\sqrt{2}}\left( 1,-1,1,-1\right) , \quad v_4=\frac{1}{2\sqrt{2}}\left( -1,1,1,-1\right) . \end{aligned}$$

Consider the initial value $y_{0}=\left( 3,3,3,-2\right) $ for which we have

$$\begin{aligned} \Vert P_1\widehat{y}_0\Vert _2=\Vert P_2\widehat{y}_0\Vert _2=0.5462 \ \ \text {and}\ \ \Vert P_3\widehat{y}_0\Vert _2 =\Vert P_4\widehat{y}_0\Vert _2=0.4490. \end{aligned}$$

The solution y consists of two decaying oscillations: the fast oscillation $y-y^{\mathrm{long}}$ decays faster than the slow oscillation $y^{\mathrm{long}}$ and, in the long-time, only the slow oscillation is present. We have $y(t)\approx y^{\mathrm{long}}(t)$ if

$$\begin{aligned} \mathrm{e}^{\left( r_2-r_1\right) t}\frac{\left\| Q_{2} \widehat{y}_{0}\right\| _{2}}{\left\| Q_{1} \widehat{y}_{0}\right\| _{2}}=0.82199\cdot \mathrm{e}^{-2t}\ll 1 \end{aligned}$$

(look at (1.17)).

Assume that the numerical integration of the ODE is accomplished by the fourth order two-stage Gauss RK method, corresponding to the $\left( 2,2\right) -$Padé approximant

$$\begin{aligned} R(z)=\frac{1+\frac{z}{2}+\frac{z^{2}}{12}}{1-\frac{z}{2} +\frac{z^{2}}{12}},\ z\in {\mathbb {C}}{\setminus } \left\{ 3\pm i\sqrt{3}\right\} , \end{aligned}$$

and by the third order two-stage Radau RK method, corresponding to the $\left( 1,2\right) -$Padé approximant

$$\begin{aligned} R(z)=\frac{1+\frac{z}{3}}{1-\frac{2z}{3} +\frac{z^{2}}{6}},\ z\in {\mathbb {C}}{\setminus } \left\{ 2\pm i\sqrt{2}\right\} . \end{aligned}$$

Both methods are applied with stepsize $h=\frac{1}{10}$ over $N=100$ steps up to $t_N=Nh=10$. Observe that such a stepsize is not suitable for approximating the fast oscillation.

We are in the stiff situation:

$$\begin{aligned} \begin{array}{|l|l|l|l|l|} \hline &{} h\rho _1 &{} \vert \sigma _{1}\vert =\vert \sigma _{2}\vert &{} E_1 &{} \vert \sigma _{3}\vert =\vert \sigma _{4}\vert \\ \hline \text {Gauss RK method} &{} 0.141 &{} 7.86\cdot 10^{-8} &{} 5.56\cdot 10^{-7} &{} 0.51 \\ \hline \text {Radau RK method} &{} 0.141 &{}5.41\cdot 10^{-6} &{} 3.82\cdot 10^{-5} &{} 3.78 \\ \hline \end{array} \end{aligned}$$

Since

$$\begin{aligned} t_N\rho _1 E_1=\left\{ \begin{array}{l} 7.86\cdot 10^{-6}\quad \text { for the Gauss RK method}\\ 5.41\cdot 10^{-4} \quad \text { for the Radau RK method} \end{array}\right. \ll 1, \end{aligned}$$

by (1.19) we have

$$\begin{aligned} \gamma _{n}^{\mathrm{long}}\approx t_n\rho _1 E_1 =\left\{ \begin{array}{l} t_n\cdot 7.86\cdot 10^{-7} \quad\text { for the Gauss RK method}\\ t_n\cdot 5.41\cdot 10^{-5} \quad \text { for the Radau RK method} \end{array}\right. \end{aligned}$$

for $t_n\le t_N$.

In the upper part of Fig. 6, we see the trajectory $t_n\mapsto \left( y_1\left( t_{n}\right) ,y_2\left( t_{n}\right) \right) $ in the plane ${\mathbb {R}}^2$ for the first two components of the exact solution $y(t_n)$, when $t_{n}\in \left[ 8,10\right] $. In the middle and lower parts, we see the trajectory $t_n\mapsto \left( y_{n,1} ,y_{n,2}\right) $ for the first two components of the numerical solution $y_n $, when $t_{n}\in \left[ 8,10\right] $. Middle part for the Gauss RK method and lower part for the Radau RK method.

For the long-time $t_{n}\in \left[ 8,10\right] $, where only the slow oscillation $y^{\mathrm{long}}$ is present, the exact components $y_1(t_n)$ and $y_2(t_n)$ are equal and have order of magnitude $10^{-4}$. The Gauss RK method exhibits numerical components $y_{n,1}$ and $y_{n,2}$ of order of magnitude $10^0$. On the other hand, the Radau RK method exhibits accurate numerical components $y_{n,1}$ and $y_{n,2}$, although the stepsize is not suitable for approximating the fast oscillation.

In Fig. 7, we see the error $\gamma _n$, for $n=0,1,\ldots ,N$, for both approximants: for the Gauss RK method the error continues to grow and for the Radau RK method it goes down to $\gamma _n^{\mathrm{long}}$.

2.2.1 Order star and stability region

Fixed $\lambda _1=-1+i$ and $\lambda _3=-3+1000i$, we are interested in understanding for which approximants we have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time. In our situation, this happens for the Radau RK method, but not for the Gauss RK method.

Order star and stability region for such approximants are shown in Fig. 8.

We have $h\lambda _{3}\in {\mathscr {R}}$ for both methods, since they are A-stable. On the other hand, we have $h\lambda _{3}\in {\mathscr {S}}^c$ only for the Radau RK method:

$$\begin{aligned} \left| S\left( h\lambda _3\right) \right| =\left\{ \begin{array}{l} 1.3494\quad \text { for the Gauss RK method}\\ 0.0270\quad \text { for the Radau RK method.} \end{array}\right. \end{aligned}$$

In Sect. 4, we will see a condition on the approximant for having $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time. When the condition does not hold, we have $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time. To have $h\lambda _3\in {\mathscr {S}}^c$ (i.e., with reference Fig. 8, the white region of the approximant contains $h\lambda _3$) is sufficient, but not necessary, for this condition on the approximant and to have $h\lambda _3\in {\mathscr {R}}$ (i.e. the red region of the approximant contains $h\lambda _3$) is necessary, but not sufficient.

3 The appropriate definition of $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time

In the following, we assume to be in the base situation. Then, it is expected $E_1$ small and $h\rho _1$ non-large. To make easier the exposition, we assume $E_1$ small and $h\rho _1$ non-large.

Since $E_1$ is small, the error $\gamma _{n}^{\mathrm{long}}$ grows linearly in time and it is small up to large times $t_n\rho _1$.

We also fix a number $c>0$ and let

$$\begin{aligned} \tau :=\frac{c}{E_1}. \end{aligned}$$

(The number c plays a role similar the number c appearing in Theorems 1.1, 1.2 and 1.3). As a reference value for c, one can take $c=1$. As a matter of generality, we do not confine c only to this value. In all theorems below, it is stated for which $c>0$ they are valid. However, when the theorems are applied, c is considered non-small, so we have

$$\begin{aligned} \tau \gg 1, \end{aligned}$$

and such that $g(c)<1$, i.e. $c<1.2564$, with $1-g(c)$ non-small.

In order to describe the long-time behavior of the error $\gamma _n$, we compare it to $\gamma _n^{\mathrm{long}}$ and we are interested in whether or not $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time.

Here, “in the long-time” does not mean $t_n\rho _1\rightarrow +\infty $. In fact, it is not of great interest to consider what happens for $t_n\rho _1\rightarrow +\infty $, since $\gamma _{n}^{\mathrm{long}}$ becomes non-small for a sufficiently large $t_n\rho _1$. It is of interest to have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ starting from times $t_n\rho _1$ such that $\gamma _{n}^{\mathrm{long}}$ is still small.

So, we introduce the following definition.

Definition 3.1

We say that $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time if, for some $s\in [0,\tau )$, $\gamma _n\approx \gamma _n^{\mathrm{long}}$ for $t_n\rho _1$ in the interval $[s,\tau ]$ and $\gamma _{n}^{\mathrm{long}}\ll 1$ for $t_n\rho _1$ up to the beginning of this interval, i.e. for $t_n\rho _1\in [0,\kappa s]$ and $\kappa \ge 1$ non-large.

In the definition, we consider times $t_n\rho _1$ up to $\tau $. Observe that if $K_1$ is not large (remind (1.12) and remind that $K_1=1$ is the generic situation for the matrix A), then the error $\gamma _n^{\mathrm{long}}$ is not small for $t_n\rho _1$ at the end of the interval $[0,\tau ]$.

In fact, for $t_n\rho _1\in [\kappa \tau ,\tau ]$, where $\kappa \in (0,1]$ is not small, by Theorem 1.2 we have

$$\begin{aligned} \gamma _n^{\mathrm{long}}\ge t_n\rho _1\frac{E_1}{K_1}(1-g(c)) \ge \frac{\kappa c(1-g(c))}{K_1} \end{aligned}$$

and the right-hand side in this inequality is not small.

3.1 The definition of $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time with monitor function

We can make the previous definition more precise by a monitor function.

Definition 3.2

Let $s:(0,+\infty )\times (0,+\infty )\rightarrow [0,+\infty )$ be a function such that

$$\begin{aligned} \lim \limits _{x \rightarrow +\infty }\ \frac{s(\varepsilon ,x)}{x}=0 \ \ \text {for any}\ \varepsilon >0. \end{aligned}$$

(3.1)

We say that $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time with monitor function s if, for any $\varepsilon >0$, we have

$$\begin{aligned} \vert e_n\vert \le \varepsilon \ \text {for} \ t_n\rho _1\in [s(\varepsilon ,\tau ),\tau ], \end{aligned}$$

(3.2)

where $e_n$ is such that $\gamma _n=\gamma _n^{\mathrm{long}}(1+e_n)$.

Remark 3.1

In the previous definition, we also allow monitor functions $s:(0,a]\times [b,+\infty )\rightarrow [0,+\infty )$, where $0<a,b<+\infty $. In this case, we have to specify that (3.1) holds for $\varepsilon \in (0,a]$ and (3.2) holds for $\varepsilon \in (0,a]$ and $\tau \ge b$.

3.2 What does the definition with monitor function mean?

Suppose $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time with monitor function s.

Let $\varepsilon >0$. By (3.2), we see that $\gamma _n\approx \gamma _n^{\mathrm{long}}$ with degree $\varepsilon $ for $t_n\rho _1\in [s(\varepsilon ,\tau ),\tau ]$. Moreover, by Theorem 1.2, we see that if $\frac{s(\varepsilon ,\tau )}{\tau }\ll 1$, then

$$\begin{aligned} \gamma _n^{\mathrm{long}}\le t_n\rho _1 E_1 (1+g(c)) =\kappa \frac{s(\epsilon ,\tau )}{\tau }c(1+g(c)) \ll \ 1 \end{aligned}$$

(3.3)

for $t_n\rho _1\in [0,\kappa s(\epsilon ,\tau )] $, where $\kappa \ge 1$ is not large, i.e. $\gamma _n^{\mathrm{long}}\ll 1$ for $t_n\rho _1$ up to the beginning of the interval $[s(\varepsilon ,\tau ),\tau ]$.

In summary:

$$\begin{aligned}&\text {If}\ \gamma _n\approx \gamma _n^{\mathrm{long}} \text {in the long-time with monitor function}\ s\ \text {and} \ \frac{s(\varepsilon ,\tau )}{\tau }\ll 1\\&\text {for some small}\ \varepsilon >0, \text {then} \ \gamma _n\approx \gamma _n^{\mathrm{long}} \ \text {in the long-time. In particular,}\\&\text {we have}\ \gamma _n\approx \gamma _n^{\mathrm{long}} \ \text {with degree}\ \varepsilon \ \text {for} \ t_n\rho _1\in [s(\varepsilon ,\tau ),\tau ]. \end{aligned}$$

Regarding the satisfiability of $\frac{s(\varepsilon ,\tau )}{\tau }\ll 1$, observe that s satisfies (3.1) and we have $\tau \gg 1$.

4 Analysis of the long-time behavior of $\gamma _n$

In the paper [9], it was presented an analysis of the long-time behavior of the error $\gamma _n$ important for the non-stiff situation. In the present paper, it is developed another type of analysis important for the stiff situation. In this new analysis, the complex numbers $w_i$ and $\alpha _i$ introduced below are important.

4.1 The numbers $w_i$

For any $\lambda _{i}\in \varLambda ^{-}$, i.e. for any non-rightmost eigenvalue, we introduce the complex number

$$\begin{aligned} w_{i}:=\mathrm{e}^{h\left( r_j-r_{1}\right) }S\left( h\lambda _{i}\right) , \end{aligned}$$

where $j=2,\ldots ,q$ is such that $\lambda _i\in \varLambda _j$. We set

$$\begin{aligned} W:=\max \limits _{\lambda _{i}\in \varLambda ^{-}}|w_{i}|. \end{aligned}$$

(4.1)

4.2 The numbers $\alpha _i$

For any $\lambda _{i}\in \varLambda ^{-}$, we introduce the complex number

$$\begin{aligned} \alpha _{i}:=\frac{\log w_{i}}{h\rho _1 }. \end{aligned}$$

We set

$$\begin{aligned} \alpha :=\max \limits _{\lambda _{i}\in \varLambda ^{-}}\mathrm{Re} (\alpha _{i})=\frac{\log W}{h\rho _1}. \end{aligned}$$

(4.2)

Since

$$\begin{aligned} \sigma _{i} =\log S(h\lambda _{i}), \end{aligned}$$

we have

$$\begin{aligned} \alpha _{i}=\beta _{j}+\frac{\sigma _{i}}{h\rho _1 }, \end{aligned}$$

(4.3)

where $j=2,\ldots ,q$ is such that $\lambda _i\in \varLambda _j$.

As a consequence we obtain

$$\begin{aligned} \vert \alpha -\beta _{2}\vert \le \frac{\max \limits _{\lambda _{i} \in \varLambda ^{-}}|\sigma _{i}|}{h\rho _1}. \end{aligned}$$

(4.4)

Here are some observations about $\alpha $.

It is expected $\vert \alpha \vert $ non-small. In fact, let $\lambda _i\in \varLambda _j$, with $j=2,\ldots ,q$, be a non-rightmost eigenvalue such that
$$\begin{aligned} \alpha =\mathrm{Re}(\alpha _i)=\beta _{j} +\frac{\mathrm{Re}(\sigma _{i})}{h\rho _1 }. \end{aligned}$$
The case where $\vert \alpha \vert $ is small, i.e.
$$\begin{aligned} h\rho _1=\frac{\mathrm{Re}(\sigma _{i})}{e+\vert \beta _j\vert }, \end{aligned}$$
with $\vert e\vert \ll 1$, is “unlikely”. Observe that it is expected $\vert \beta _j\vert $ non-small.
In the non-stiff situation, it is expected $\alpha $ negative non-small. In fact, it is expected $\vert \beta _2\vert $ non-small and, in the non-stiff situation, it is expected that the right-hand side of (4.4) is small and then it is expected $\vert \alpha -\beta _{2}\vert $ small.

4.3 The basic theorem

The next theorem is, in our new analysis, the analog of Theorem 5.3 in [9] (which was suitable for studying the long-time behavior of $\gamma _n$ in the non-stiff situation).

Theorem 4.1

Assume $q>1$ and $0\notin \varLambda _{1}$. Fix $c>0$ such that $g(c)<1$, i.e. $c<1.2564$.

For $t_n\rho _1 \le \tau $, we have

$$\begin{aligned} \gamma _{n}=\gamma _{n}^{\mathrm{long}}\left( 1+e_{n}\right) , \end{aligned}$$

where

$$\begin{aligned} \left| e_n\right|\le & {} \frac{1}{2} \max \left\{ \sum \limits _{j=2}^{q}\left( \mathrm{e}^{\beta _{j} t_{n}\rho _1 }\frac{\left\| Q_{j}\widehat{y}_{0} \right\| _{2}}{\left\| Q_{1} \widehat{y}_{0}\right\| _{2}}\right) ^{2},\right. \nonumber \\&\qquad \qquad \quad \left. \sum \limits _{j=2}^{q} \sum \limits _{\lambda _{i}\in \varLambda _{j}} \left( \frac{\vert \mathrm{e}^{\alpha _{i}t_{n}\rho _1 }-\mathrm{e}^{\beta _{j} t_{n}\rho _1 }\vert \tau }{t_{n}\rho _1 }\cdot \frac{K_1}{c\left( 1-g\left( c\right) \right) }\cdot \frac{\left\| P_{i}\widehat{y}_{0}\right\| _{2}}{\left\| Q_{1}\widehat{y}_{0}\right\| _{2}} \right) ^{2}\right\} . \end{aligned}$$

(4.5)

Proof

For $n=0,1,2,\ldots $, the error $\gamma _{n}$ is given by

$$\begin{aligned} \gamma _{n}=\frac{\sqrt{\sum \nolimits _{j=1}^{q}\left( \mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}\epsilon _{n,j}\right) ^{2}}}{\sqrt{ \sum \nolimits _{j=1}^{q}\left( \mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}\left\| Q_{j}\widehat{y}_{0}\right\| _{2}\right) ^{2}}}, \end{aligned}$$

(4.6)

where

$$\begin{aligned} \epsilon _{n,j}:=\left\| \sum \limits _{\lambda _i\in \varLambda _{j}}\left( S\left( h\lambda _{i}\right) ^{n}-1\right) P_{i}\widehat{y} _{0}\right\| _{2}, \ j=1,\ldots ,q, \end{aligned}$$

(see Theorem 2.1 in [9]). By (4.6), as applied with the initial value $ Q_{1}y_0$ instead of $y_0$, we obtain

$$\begin{aligned} \gamma _{n}^{\mathrm{long}}=\frac{\epsilon _{n,1}}{\left\| Q_{1} \widehat{y}_0\right\| _2}. \end{aligned}$$

(4.7)

By (4.6) and (4.7), we can write

$$\begin{aligned} \gamma _{n}=\gamma _{n}^{\mathrm{long}}\left( 1+e_{n}\right) , \end{aligned}$$

where

$$\begin{aligned} \left| e_{n}\right|= & {} \left| \frac{\sqrt{ 1+\sum \nolimits _{j=2}^{q}\left( \mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}\frac{\epsilon _{n,j}}{\epsilon _{n,1}}\right) ^{2}}}{ \sqrt{1+\sum \nolimits _{j=2}^{q}\left( \mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}\frac{\left\| Q_{j}\widehat{y}_{0}\right\| _{2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^{2}}} -1\right| \nonumber \\\le & {} \frac{1}{2}\max \left\{ \sum \limits _{j=2}^{q}\left( \mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}\frac{\left\| Q_{j}\widehat{y} _{0}\right\| _{2}}{\left\| Q_{1}\widehat{y}_{0}\right\| _{2} }\right) ^{2},\sum \limits _{j=2}^{q}\left( \mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}\frac{ \epsilon _{n,j}}{\epsilon _{n,1}}\right) ^{2}\right\} . \end{aligned}$$

(4.8)

By Theorem 1.2 and (4.7), we have

$$\begin{aligned} \epsilon _{n,1}\ge \frac{t_n\rho _1E_1}{K_1}\left( 1-g\left( c\right) \right) \left\| Q_{1}\widehat{y}_{0}\right\| _{2}= \frac{t_n\rho _1}{\tau }\cdot \frac{c( 1-g\left( c\right) )}{K_1}\left\| Q_{1}\widehat{y}_{0}\right\| _{2}. \end{aligned}$$

(The assumption $g(c)<1$ implies that the right-hand side is positive). Moreover, for $j=2,\ldots ,q$, we have

$$\begin{aligned} \mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}\epsilon _{n,j}= & {} \left\| \sum \limits _{\lambda _i\in \varLambda _{j}}\left( \mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}S\left( h\lambda _{i}\right) ^{n}-\mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}\right) P_{i} \widehat{y}_{0}\right\| _{2}\nonumber \\= & {} \left\| \sum \limits _{\lambda _{i}\in \varLambda _{j}} \left( w_{i}^{n}-\mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}\right) P_{i}\widehat{y}_{0}\right\| _{2}\nonumber \\= & {} \sqrt{\sum \limits _{\lambda _{i}\in \varLambda _{j}}\left| w_{i}^{n}-\mathrm{e}^{\left( r_{j}-r_{1 }\right) t_{n}}\right| ^{2}\left\| P_{i}\widehat{y}_{0}\right\| _{2}^{2}} \nonumber \\= & {} \sqrt{\sum \limits _{\lambda _{i}\in \varLambda _{j}}\left| \mathrm{e}^{\alpha _it_n\rho _1}-\mathrm{e}^{\beta _jt_n\rho _1}\right| ^{2} \left\| P_{i}\widehat{y}_{0}\right\| _{2}^{2}}, \end{aligned}$$

(4.9)

where the third equality follows by A normal, which implies the orthogonality of the eigenspaces. So, in (4.8) we have

$$\begin{aligned} \sum \limits _{j=2}^{q}\left( \mathrm{e}^{\left( r_{j}-r_{1}\right) t_{n}}\frac{\epsilon _{n,j}}{\epsilon _{n,1}}\right) ^{2}\le \sum \limits _{j=2}^{q}\sum \limits _{\lambda _{i}\in \varLambda _{j}}\left( \frac{\vert \mathrm{e}^{\alpha _{i}t_{n}\rho _1 }-\mathrm{e}^{\beta _{j}t_{n}\rho _1 }\vert \tau }{t_{n}\rho _1 }\cdot \frac{K_1}{c\left( 1-g\left( c\right) \right) }\cdot \frac{\left\| P_{i}\widehat{y}_{0}\right\| _{2} }{\left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^{2} \end{aligned}$$

and (4.5) now follows. $\square $

Remark 4.1

In the case $0\in \varLambda _{1}$ and $\varLambda _1\ne \{0\}$, the theorem still holds holds with (4.5) replaced by

$$\begin{aligned} \left| e_n\right|\le & {} \frac{1}{2}\max \left\{ \sum \limits _{j=2}^{q} \left( \mathrm{e}^{\beta _{j}t_{n}\rho _1 }\frac{ \left\| Q_{j}\widehat{y}_{0}\right\| _{2}}{\left\| Q_{1} \widehat{y}_{0}\right\| _{2}}\right) ^{2},\right. \\&\qquad \qquad \quad \left. \sum \limits _{j=2}^{q} \sum \limits _{\lambda _{i}\in \varLambda _{j}}\left( \frac{\vert \mathrm{e}^{\alpha _{i}t_{n}\rho _1 }-\mathrm{e}^{\beta _{j}t_{n}\rho }\vert \tau }{t_{n}\rho _1 }\cdot \frac{K_1^+}{c\left( 1-g\left( c\right) \right) }\cdot \frac{\left\| P_{i}\widehat{y}_{0}\right\| _{2}}{\sqrt{\sum \nolimits _{\lambda _{k}\in \varLambda _{1}{\setminus } \left\{ 0\right\} }\left\| P_{k} \widehat{y}_{0}\right\| _{2}^{2}}}\right) ^{2}\right\} , \end{aligned}$$

where

$$\begin{aligned} K_1^+=\frac{\max \limits _{\lambda _{i}\in \varLambda _{1}}\left| \sigma _{i}\right| }{\min \limits _{\lambda _{i}\in \varLambda _{1}\setminus \{0\}}\left| \sigma _{i}\right| }=\left( \frac{\rho _1 }{ \mu _{\varLambda _{1}\setminus \{0\}} }\right) ^{l+1}\left( 1+O\left( h\rho _1 \right) \right) ,\ h\rho _1 \rightarrow 0. \end{aligned}$$

In the following, we continue to assume $0\notin \varLambda _1$, but all our conclusions are valid (with easy adaptations) also for the case $ 0\in \varLambda _1$ and $\varLambda _1\ne \{0\}$.

4.4 A first result

We give a first theorem about $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time with a monitor function.

Theorem 4.2

Assume $q>1$ and $0\notin \varLambda _1$. Fix $c>0$ such that $g(c)<1$, i.e. $c<1.2564$.

We have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time with monitor function

$$\begin{aligned} s(\epsilon ,x)=s(\epsilon )= & {} \frac{1}{\vert \beta _2\vert }\left( \max \left\{ 0,\log \left( \mathrm{e}^{Mc}-1\right) +\log K_1 +\log \frac{1}{c(1-g(c))}\right\} \right. \nonumber \\&\qquad \qquad \left. +\log \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{\left\| Q_{1} \widehat{y}_{0}\right\| _{2}} +\frac{1}{2} \log \frac{1}{\varepsilon }-\frac{1}{2}\log 2\right) , \end{aligned}$$

(4.10)

defined for $\varepsilon >0$.

Proof

Let $s\in [0,\tau ]$. For $t_{n}\rho _1 \in \left[ s,\tau \right] $, in (4.5) of Theorem 4.1 we have

$$\begin{aligned} \mathrm{e}^{\beta _{j}t_{n}\rho _1 }\le \mathrm{e}^{\beta _{j}s} \end{aligned}$$

and

$$\begin{aligned} \frac{\vert \mathrm{e}^{\alpha _{i}t_{n}\rho _1 }-\mathrm{e}^{\beta _{j}t_{n}\rho _1 }\vert \tau }{t_{n}\rho _1 }= & {} \mathrm{e}^{\beta _{j}t_{n}\rho _1 }\frac{\vert \mathrm{e}^{(\alpha _{i}-\beta _j) t_{n}\rho _1 }-1\vert \tau }{t_n\rho _1}\le \mathrm{e}^{\beta _{j}t_{n}\rho _1 }\frac{\left( \mathrm{e}^{\vert \alpha _{i} -\beta _j\vert t_{n}\rho _1 }-1\right) \tau }{t_n\rho _1}\\\le & {} \mathrm{e}^{\beta _{j}t_{n}\rho _1 }\left( \mathrm{e}^{\vert \alpha _{i}-\beta _j\vert \tau }-1\right) \\= & {} \mathrm{e}^{\beta _{j}t_{n} \rho _1 }\left( \mathrm{e}^{M_ic}-1\right) \le \mathrm{e}^{\beta _{j}s}\left( \mathrm{e}^{M_ic}-1\right) , \end{aligned}$$

where the last equality follows by (4.3) (remind (1.13)).

So, for $t_n\rho _1\in [s,\tau ]$, we obtain

$$\begin{aligned} \left| e_n\right|\le & {} \frac{1}{2}\max \left\{ \sum \limits _{j=2}^{q} \left( \mathrm{e}^{\beta _{j}s }\frac{ \left\| Q_{j}\widehat{y}_{0}\right\| _{2}}{\left\| Q_{1} \widehat{y}_{0}\right\| _{2}}\right) ^{2},\right. \\&\qquad \qquad \quad \left. \sum \limits _{j=2}^{q} \sum \limits _{\lambda _{i}\in \varLambda _{j}}\left( \mathrm{e}^{\beta _{j}s}\left( \mathrm{e}^{M_ic}-1\right) \frac{K_1}{c\left( 1-g \left( c\right) \right) }\cdot \frac{\left\| P_{i} \widehat{y}_{0}\right\| _{2}}{\left\| Q_{1 } \widehat{y}_{0}\right\| _{2}}\right) ^{2}\right\} . \\\le & {} \frac{1}{2}\left( \mathrm{e}^{\beta _2 s} \max \left\{ 1,(\mathrm{e}^{Mc}-1)\frac{K_1}{c(1-g(c))}\right\} \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{\left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^2=f(s) \end{aligned}$$

By inverting the function f, we obtain the monitor function (4.10). $\square $

Remark 4.2

For $\Vert Q_1\widehat{y}_0\Vert _2$ sufficiently close to 1, we have $s(\varepsilon )<0$. There are two ways for dealing with this. One is to redefine $s(\varepsilon )$ as 0 when $s(\varepsilon )<0$. The other is to use (0, a] as domain of s, where $s(a)=0$. So, we have $s(\varepsilon )\ge 0$ for $\varepsilon \in (0,a]$.

By (4.10), (1.14) and (1.12), one can easily prove Theorem 1.3.

The previous theorem with $c=1$ gives the following results.

Theorem 4.3

Let $\tau =\frac{1}{E_1}$, let $k>0$ and let

$$\begin{aligned} s=\frac{1}{\vert \beta _2\vert }\left( M+\log K_1 +\log \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}}+\frac{k}{2}+0.9203\right) . \end{aligned}$$

(4.11)

If $\varepsilon =\mathrm{e}^{-k}\ll 1$ and

$$\begin{aligned} \frac{s}{\tau }\ll 1, \end{aligned}$$

then $\gamma _n\approx \gamma _n^{\mathrm{long}}$ for $t_n\rho _1$ in the interval $[s,\tau ]$ and $\gamma _n^{\mathrm{long}}\ll 1$ for $t_n\rho _1$ up to the beginning of this interval. In particular, we have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ with degree $\varepsilon $ for $t_n\rho _1\in [s,\tau ]$ and

$$\begin{aligned} \gamma _n^{\mathrm{long}}\le 1.7183\kappa \frac{s}{\tau }\ll 1 \end{aligned}$$

(4.12)

for $t_n\rho _1\in [0, \kappa s]$, where $\kappa \ge 1$ is not large.

Proof

By putting $c=1$ in (4.10) and by observing that $\log (\mathrm{e}^{M}-1)\le M$, we obtain

$$\begin{aligned} s(\epsilon )\le s&= \frac{1}{\vert \beta _2\vert }\left( \max \left\{ 0,M+\log K_1 +\log \frac{1}{1-g(1)}\right\} \right. \\&\quad \left. +\log \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}} +\frac{1}{2}\log \frac{1}{\varepsilon }-\frac{1}{2}\log 2\right) ,\\&= \frac{1}{\vert \beta _2\vert }\left( M+\log K_1 +\log \frac{1}{1-g(1)}\right. \\&\quad \left. +\log \frac{\sqrt{1-\left\| Q_{1} \widehat{y}_{0}\right\| _{2}^2}}{\left\| Q_{1} \widehat{y}_{0}\right\| _{2}} +\frac{1}{2} \log \frac{1}{\varepsilon }-\frac{1}{2}\log 2\right) \end{aligned}$$

since $K_1\ge 1$ and $1-g(1)<1$. Now, (4.11) follows since

$$\begin{aligned} \log \frac{1}{1-g(1)}-\frac{1}{2}\log 2=0.9203. \end{aligned}$$

About (4.12), take $c=1$ in (3.3). $\square $

Theorem 4.4

$$\begin{aligned}&\frac{1}{\vert \beta _2\vert }\cdot \frac{\max \limits _{\lambda _i\in \varLambda ^{-}} \left| \sigma _{i}\right| }{h\rho _1}\ll 1 \nonumber \\&\frac{1}{\vert \beta _2\vert }\log K_1\cdot E_1\ll 1 \nonumber \\&\frac{1}{\vert \beta _2\vert }\log \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{\left\| Q_{1} \widehat{y}_{0}\right\| _{2}}\cdot E_1\ll 1\nonumber \\&\frac{1}{\vert \beta _2\vert }\left( \frac{1}{2}\log \frac{1}{E_1}+0.9203\right) \cdot E_1\ll 1, \end{aligned}$$

(4.13)

then $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time.

Proof

Use the previous theorem with $\varepsilon =\frac{1}{\tau }=E_1\ll 1$ and observe that

$$\begin{aligned} \frac{M}{\tau }=ME_1=\frac{\max \nolimits _{\lambda _i\in \varLambda ^{-}} \left| \sigma _{i}\right| }{h\rho _1}. \end{aligned}$$

$\square $

It is expected that the last three conditions in (4.13) are satisfied. In fact, since $E_1\ll 1$, they are not satisfied only in “extreme” cases. Moreover, in the non-stiff situation, it is expected that the first condition is satisfied. In fact, since in the non-stiff situation it is expected $\frac{\max \nolimits _{\lambda _i\in \varLambda ^{-}}\left| \sigma _{i}\right| }{h\rho _1}\ll 1$, the first condition is not satisfied only in “extreme” cases.

So, we can state the following important conclusion.

Conclusion 4.5

Suppose to be in the non-stiff situation. It is expected $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time.

4.5 The condition A

We introduce the condition

$$\begin{aligned} \text {A:}\ \ \ W<1,\ \ \ \text {equivalently}\ \ \alpha < 0, \end{aligned}$$

where W and $\alpha $ are defined in (4.1) and (4.2), respectively.

Next theorem shows that, under the condition A, we have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time with a new monitor function different from (4.10).

Theorem 4.6

Assume $q>1$ and $0\notin \varLambda _1$. Fix $c>0$ such that $g(c)<1$, i.e. $c<1.2564$.

If A holds, then $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time with monitor function

$$\begin{aligned} s(\epsilon ,x)= & {} \frac{1}{\min \{\vert \alpha \vert ,\vert \beta _2\vert \}}\left( \log x+\log K_1 +\log \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{\left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right. \nonumber \\&\qquad \qquad \qquad \qquad \qquad \left. +\frac{1}{2}\log \frac{1}{\varepsilon } +\log \frac{1}{c(1-g(c))}+\frac{1}{2}\log 2\right) \end{aligned}$$

(4.14)

defined for $x\ge 1$ and for $\varepsilon >0$ such that the right-hand side of (4.14) with $x=1$ is greater than or equal to 1, so we have $s(\epsilon ,x)\ge 1$ for $x\ge 1$ and such $\varepsilon $.

Proof

Let A holds. Let $\tau \ge 1$ and let $s\in [1,\tau ]$. For $t_{n}\rho _1 \in \left[ s,\tau \right] $, in (4.5) of Theorem 4.1 we have

$$\begin{aligned} \mathrm{e}^{\beta _{j}t_n\rho _1 }\le \mathrm{e}^{\beta _{j}s} \end{aligned}$$

and

$$\begin{aligned} \frac{\vert \mathrm{e}^{\alpha _{i}t_{n}\rho _1 }-\mathrm{e}^{\beta _{j}t_{n}\rho _1 }\vert }{t_{n}\rho _1 }\le \frac{ \mathrm{e}^{\mathrm{Re} (\alpha _{i})t_{n}\rho _1 }+\mathrm{e}^{\beta _{j}t_{n}\rho _1 } }{t_{n}\rho _1 } \le \frac{\mathrm{e}^{\mathrm{Re}(\alpha _{i})s }+\mathrm{e}^{\beta _{j}s} }{s}\le \mathrm{e}^{\mathrm{Re}(\alpha _{i})s} +\mathrm{e}^{\beta _{j}s}, \end{aligned}$$

where the last equality follows by $s\ge 1$. So, for $t_n\rho \in [s,\tau ]$, we obtain

$$\begin{aligned} \left| e_n\right|\le & {} \frac{1}{2}\max \left\{ \sum \limits _{j=2}^{q}\left( \mathrm{e}^{\beta _{j}s }\frac{ \left\| Q_{j}\widehat{y}_{0}\right\| _{2}}{\left\| Q_{1} \widehat{y}_{0}\right\| _{2}}\right) ^{2},\right. \\&\quad \left. \sum \limits _{j=2}^{q} \sum \limits _{\lambda _{i}\in \varLambda _{j}} \left( \left( \mathrm{e}^{\mathrm{Re}(\alpha _{i})s} +\mathrm{e}^{\beta _{j}s}\right) \tau \frac{K_1}{c\left( 1-g\left( c\right) \right) }\cdot \frac{\left\| P_{i}\widehat{y}_{0}\right\| _{2} }{\left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^{2}\right\} . \\\le & {} \frac{1}{2}\left( \mathrm{e}^{\max \{ \alpha , \beta _2 \} s} \max \left\{ 1,2\tau \frac{K_1}{c(1-g(c))}\right\} \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^2\\= & {} \frac{1}{2}\left( \mathrm{e}^{\max \{\alpha ,\beta _2\} s}2\tau \frac{K_1}{c(1-g(c))}\cdot \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^2=f(s,\tau ), \end{aligned}$$

where the last equality follows by

$$\begin{aligned} \tau \frac{K_1}{c(1-g(c))}\ge \frac{1}{c(1-g(c))}> 1. \end{aligned}$$

By inverting the function f with respect to s, we obtain the monitor function (4.14). $\square $

Remark 4.3

The monitor function (4.14) is defined for $\varepsilon \in (0,a]$, where

$$\begin{aligned} a= \frac{1}{2}\left( \mathrm{e}^{-\min \{\vert \alpha \vert , \vert \beta _2\vert \} }2\frac{K_1}{c(1-g(c))} \cdot \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^2. \end{aligned}$$

By looking at the proof of the previous theorem, we see that there is also a monitor function $s(\varepsilon ,x)$ defined for all $\varepsilon >0$ and $x>0$. It is obtained by inverting with respect to s the upper bound

$$\begin{aligned} \frac{1}{2}\left( \frac{\mathrm{e}^{-\min \{\vert \alpha \vert ,\vert \beta _2\vert \} s}}{s}2\tau \frac{K_1}{c(1-g(c))} \cdot \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^2=f(s,\tau ). \end{aligned}$$

of $\vert e_n\vert $, where $e_n$ is given in Theorem 4.1. Observe that the inverse exists since $\frac{\mathrm{e}^{-\min \{\vert \alpha \vert ,\vert \beta _2\vert \}s}}{s}$ is a strictly decreasing function of s. This new monitor function has the advantage that it is no longer necessary to suppose $\varepsilon \le a$ (where is a given at point 1) above) and $\tau \ge 1$. However, we prefer to use the old monitor function (4.14) because it has an explicit expression.

The previous theorem with $c=1$ gives the next important results.

Theorem 4.7

Suppose A holds. Let $\tau =\frac{1}{E_1}$, let $k>0$ and let

$$\begin{aligned} s=\frac{1}{\min \{\vert \alpha \vert ,\vert \beta _2\vert \}}\left( \log \tau +\log K_1 +\log \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}}+\frac{k}{2}+1.6134\right) . \end{aligned}$$

(4.15)

If $\varepsilon =\mathrm{e}^{-k}\ll 1$ and

$$\begin{aligned} \frac{\max \{1,s\}}{\tau }\ll 1, \end{aligned}$$

then $\gamma _n\approx \gamma _n^{\mathrm{long}}$ for $t_n\rho _1$ in the interval $[\max \{1,s\},\tau ]$ and $\gamma _n^{\mathrm{long}}\ll 1$ for $t_n\rho _1$ up to the beginning of this interval. In particular, we have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ with degree $\varepsilon $ for $t_n\rho _1\in [\max \{1,s\},\tau ]$ and

$$\begin{aligned} \gamma _n^{\mathrm{long}}\le 1.7183\kappa \frac{\max \{1,s\}}{\tau }\ll 1 \end{aligned}$$

for $t_n\rho _1\in [0,\kappa \max \{1,s\}]$, where $\kappa \ge 1$ is not large.

Proof

We use Theorem 4.6 with $c=1$. If $s\ge 1$, then $s(\varepsilon ,\tau )=s$. Observe that in (4.14) with $c=1$ we have

$$\begin{aligned} \log \frac{1}{1-g(1)}+\frac{1}{2}\log 2=1.6134. \end{aligned}$$

Theorem 4.6 says that $\gamma _n\approx \gamma _n^{\mathrm{long}}$ with degree $\varepsilon $ for $t_n\rho _1\in [s,\tau ]$. If $s<1$, consider $\overline{k}$, with $\overline{k}> k$, such that

$$\begin{aligned} \frac{1}{\min \{\vert \alpha \vert ,\vert \beta _2\vert \}}\left( \log \tau +\log K_1 +\log \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}}+\frac{\overline{k}}{2} +1.6134\right) =1. \end{aligned}$$

We have $s(\overline{\varepsilon },\tau )=1$, where $\overline{\varepsilon }=\mathrm{e}^{-\overline{k}}$, with $\overline{\varepsilon }< \varepsilon $. Theorem 4.6 says that $\gamma _n\approx \gamma _n^{\mathrm{long}}$ with degree $\overline{\varepsilon }$, and then with degree $\varepsilon $, for $t_n\rho _1\in [1,\tau ]$. $\square $

Theorem 4.8

Suppose A holds. If

$$\begin{aligned}&\frac{1}{\min \{\vert \alpha \vert ,\vert \beta _2\vert \}}\log K_1\cdot E_1\ll 1 \nonumber \\&\frac{1}{\min \{\vert \alpha \vert ,\vert \beta _2\vert \}}\log \frac{\sqrt{1-\left\| Q_{1}\widehat{y}_{0}\right\| _{2}^2}}{\left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\cdot E_1\ll 1 \nonumber \\&\frac{1}{\min \{\vert \alpha \vert ,\vert \beta _2\vert \}} \left( \frac{3}{2}\log \frac{1}{E_1}+1.6134\right) \cdot E_1\ll 1, \end{aligned}$$

(4.16)

then $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time.

Proof

Use the previous theorem with $\varepsilon =\frac{1}{\tau }=E_1\ll 1$. $\square $

It is expected that if A holds, then the three conditions in (4.16) are satisfied. In fact, since $E_1\ll 1$, they are not satisfied only in “extreme” cases. So, it is expected that if A holds then $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time. Of course, we already know that in the non-stiff situation it is expected $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time, independently of the condition A, as well as we know that it expected that A holds in the non-stiff situation.

So, what is really important is the following conclusion.

Conclusion 4.9

Suppose to be in the stiff situation. It is expected that if A holds, then $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time.

4.5.1 Order star and stability region

The condition A can be related to the order star and the stability region of the approximant (recall the beginning of Sect. 2).

Theorem 4.10

Let ${\mathscr {S}}^c$ be the complementary set of the order star of the approximant. The condition

$$\begin{aligned} h\lambda _{i}\in {\mathscr {S}}^{c} \text { for any }\lambda _{i}\in \varLambda ^- \end{aligned}$$

implies A.

Proof

Let $\lambda _{i}\in \varLambda ^{-}$. If $h\lambda _{i}\in {\mathscr {S}}^{c}$, then $|w_{i}|<1$. In fact, if $h\lambda _{i}\in {\mathscr {S}}^{c}$, then

$$\begin{aligned} \left| w_{i}\right| =\mathrm{e}^{h(r_{j}-r_{1})}\left| S\left( h\lambda _{i}\right) \right| <\left| S\left( h\lambda _{i}\right) \right| \le 1, \end{aligned}$$

where $j=2,\ldots ,q$ is such that $\lambda _{i}\in \varLambda _{j}$. $\square $

Theorem 4.11

Let

$$\begin{aligned} \overset{\circ }{{\mathscr {R}}}=\{z\in {\mathscr {D}}:\vert R(z)\vert <1\} \end{aligned}$$

be the interior of the stability region ${\mathscr {R}} $ of the approximant. If $r_{1}\le 0$, then A implies the condition

$$\begin{aligned} h\lambda _{i}\in \overset{\circ }{{\mathscr {R}}}\text { for any } \lambda _{i}\in \varLambda ^{-}. \end{aligned}$$

(4.17)

If $r_{1}\ge 0$, then the condition (4.17) implies A.

Proof

Let $\lambda _{i}\in \varLambda ^{-}$. If $r_{1}\le 0$ and $|w_{i}|<1$, then

$$\begin{aligned} |R(h\lambda _{i})|=\mathrm{e}^{hr_{1}}|w_{i}| \le |w_{i}|<1. \end{aligned}$$

If $r_{1}\ge 0$ and $h\lambda _{i}\in \overset{\circ }{{\mathscr {R}}}$, then

$$\begin{aligned} |w_{i}|=\mathrm{e}^{-hr_{1 }}|R(h\lambda _{i})|\le |R(h\lambda _{i})|<1. \end{aligned}$$

$\square $

These theorems agree with what has been observed in the examples of Sect. 2 about order stars and stability regions.

4.5.2 The region ${\mathscr {R}}_x$

For any $x\in {\mathbb {R}}$, let

$$\begin{aligned} {\mathscr {R}}_{x}:=\left\{ z\in {\mathscr {D}}:|R(z)| <\mathrm{e}^{x}\right\} . \end{aligned}$$

(4.18)

We have ${\mathscr {R}}_0= \overset{\circ }{{\mathscr {R}}}$.

For $\lambda _i\in \varLambda ^{-}$, we have

$$\begin{aligned} \vert w_i\vert =\mathrm{e}^{-hr_1}\vert R(h\lambda _i)\vert . \end{aligned}$$

Thus, the condition A can be restated as

$$\begin{aligned} h\lambda _{i}\in {\mathscr {R}}_{hr_{1}}\text { for any } \lambda _{i}\in \varLambda ^{-}. \end{aligned}$$

In the case $r_1=0$, it becomes

$$\begin{aligned} h\lambda _{i}\in \overset{\circ }{{\mathscr {R}}}\text { for any } \lambda _{i}\in \varLambda ^{-} \end{aligned}$$

according to Theorem 4.11 about the cases $r_1\le 0$ and $r_1\ge 0$.

4.5.3 The conditions B and C

When A does not hold, we have B or C, where

$$\begin{aligned} \text {B:}\ \ \ W>1,\ \ \ \text {equivalently}\ \ \alpha > 0, \end{aligned}$$

and

$$\begin{aligned} \text {C:}\ \ \ W=1,\ \ \ \text {equivalently}\ \ \alpha = 0. \end{aligned}$$

In the next section we study, what happens when A does not holds. Of course, when A does not hold, it is expected that B holds.

5 When the condition A does not hold

Next theorem helps to say when $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$. We exclude the time $t_n\rho _1=0$, i.e. the index $n=0$, since the ratio $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}$ for $n=0$ is indeterminate $\frac{0}{0}$.

Theorem 5.1

Assume $q>1$. Fix $c>0$.

For $t_n\rho _1\in (0,\tau ]$, we have

$$\begin{aligned} \frac{\gamma _{n}}{\gamma _{n}^{\mathrm{long}}}\ge \max \limits _{\lambda _i\in \varLambda ^{-}} \frac{\vert \mathrm{e}^{\alpha _{i}t_{n}\rho _1 }-\mathrm{e}^{\beta _{j}t_{n}\rho _1 }\vert \tau }{t_{n}\rho _1 }\cdot \frac{1}{c\left( 1+g\left( c\right) \right) }\left\| P_{i}\widehat{y}_{0}\right\| _{2}, \end{aligned}$$

(5.1)

where $j=2,\ldots ,q$ is such that $\lambda _{i}\in \varLambda _j$.

Proof

Recall (4.6) and (4.7). For any $j=2,\ldots ,q$ and $\lambda _{i}\in \varLambda _j$ we have, for $n=0,1,2,\ldots $,

$$\begin{aligned} \frac{\gamma _{n}}{\gamma _{n}^{\mathrm{long}}}= & {} \frac{\sqrt{ 1+\sum \nolimits _{m=2}^{q}\left( \mathrm{e}^{\left( r_{m}-r_{1}\right) t_{n}} \frac{\epsilon _{n,m}}{\epsilon _{n,1}}\right) ^{2}}}{ \sqrt{1+\sum \nolimits _{m=2}^{q}\left( \mathrm{e}^{\left( r_{m}-r_{1}\right) t_{n}}\frac{\left\| Q_{m}\widehat{y}_{0}\right\| _{2}}{ \left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^{2}}} \\= & {} \frac{\sqrt{1+\sum \nolimits _{m=2}^{q}\sum \nolimits _{\lambda _{k}\in \varLambda _{m}}\left( \frac{\left| \mathrm{e}^{\alpha _k t_n\rho _1}-\mathrm{e}^{\beta _m t_n\rho _1}\right| \left\| P_{k}\widehat{y} _{0}\right\| _{2}}{\epsilon _{n,1}}\right) ^{2}}}{\sqrt{ 1+\sum \nolimits _{m=2}^{q}\left( \mathrm{e}^{\beta _m t_n\rho _1}\frac{\left\| Q_{m}\widehat{y}_{0}\right\| _{2}}{\left\| Q_{1}\widehat{y}_{0}\right\| _{2}}\right) ^{2}}}\\\ge & {} \frac{\left| \mathrm{e}^{\alpha _i t_n\rho _1}-\mathrm{e}^{\beta _j t_n \rho _1}\right| }{\epsilon _{n,1}}\left\| P_{i}\widehat{y} _{0}\right\| _{2}\left\| Q_{1}\widehat{y}_{0}\right\| _{2}, \end{aligned}$$

where the second equality follows by (4.9).

By Theorem 1.2, we have

$$\begin{aligned} \epsilon _{n,1}\le t_n\rho _1 E_1 \left( 1+g\left( c\right) \right) \left\| Q_{1}\widehat{y}_{0}\right\| _{2} \end{aligned}$$

and then

$$\begin{aligned} \frac{\gamma _{n}}{\gamma _{n}^{\mathrm{long}}}\ge \frac{\vert \mathrm{e}^{\alpha _{i}t_{n}\rho _1 }-\mathrm{e}^{\beta _{j}t_{n}\rho _1} \vert \tau }{t_{n}\rho _1 }\cdot \frac{1}{c(1+g(c))}\left\| P_{i}\widehat{y}_{0}\right\| _{2} . \end{aligned}$$

The inequality (5.1) now follows. $\square $

5.1 Definition of $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time

Here, “for all time” we do not mean for all times $t_n\rho _1$, since in our analysis we consider $t_n\rho _1$ up to $\tau $. So, we introduce the following definition

Definition 5.1

We say that $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time if $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for $t_n\rho _1\in (0,\tau ]$.

This definition is made more precise by using a monitor function.

Definition 5.2

Let $F:(0,+\infty )\rightarrow (0,+\infty )$ such that

$$\begin{aligned} \lim \limits _{x \rightarrow +\infty }\ F(x)=+\infty . \end{aligned}$$

(5.2)

We say that $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time with monitor function F if

$$\begin{aligned} \frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\ge F(\tau ) \ \ \text {for}\ \ t_n\rho _1 \in (0,\tau ]. \end{aligned}$$

Remark 5.1

In the previous definition, we also allow monitor functions $F:[b,+\infty )\rightarrow [0,+\infty )$, where $0<b<+\infty $.

Thus:

$$\begin{aligned}&\text {if}\ \frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1 \ \text {for all time with monitor function} \ F\ \text {and}\ F(\tau )\gg 1,\\&\text {then}\ \frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1 \ \text {for all time. In particular, we have} \ \frac{\gamma _n}{\gamma _n^{\mathrm{long}}} \gg F(\tau )\\&\text {for}\ t_n\rho _1\in (0,\tau ]. \end{aligned}$$

Regarding the satisfiability of $F(\tau )\gg 1$, observe that F satisfies (5.2) and we have $\tau \gg 1$.

5.2 The condition B

The next theorem explains what happens under the condition B.

Theorem 5.2

Assume $q>1$. Fix $c>0$.

If B holds, then $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time with monitor function

$$\begin{aligned} F(x)=\frac{1}{c\left( 1+g\left( c\right) \right) }\alpha \min \limits _{\begin{array}{c} \lambda _i\in \varLambda ^{-}\\ \mathrm{Re}(\alpha _i)>0 \end{array}}\left\| P_{i} \widehat{y}_{0}\right\| _{2}\cdot x \end{aligned}$$

(5.3)

defined for $x>0$.

Proof

Let B holds. For $t_{n}\rho _1 \in (0,\tau ]$, in (5.1) we have

$$\begin{aligned} \frac{\left| \mathrm{e}^{\alpha _i t_{n}\rho _1 } -\mathrm{e}^{\beta _{j}t_{n}\rho _1 }\right| }{t_{n}\rho _1 }\ge \frac{\mathrm{e}^{\mathrm{Re}(\alpha _{i})t_{n}\rho _1 }-1 }{t_{n}\rho _1 }. \end{aligned}$$

If $\mathrm{Re}(\alpha _{i})\le 0$, we have

$$\begin{aligned} \frac{\mathrm{e}^{\mathrm{Re}(\alpha _{i})t_{n}\rho _1 }-1 }{t_{n}\rho _1 } \le 0. \end{aligned}$$

If $\mathrm{Re}(\alpha _{i})> 0$, we have

$$\begin{aligned} \frac{\mathrm{e}^{\mathrm{Re}(\alpha _{i})t_{n}\rho _1}-1}{t_{n}\rho _1 } \ge \mathrm{Re}(\alpha _{i}) . \end{aligned}$$

So, by (5.1), we obtain

for $t_n\rho _1\in (0,\tau ]$, where F is the function in (5.3). $\square $

Remark 5.2

We can have another monitor function by substituting in (5.3) the term $\alpha \min \nolimits _{\begin{array}{c} \lambda _i\in \varLambda ^{-}\\ \mathrm{Re}(\alpha _i)>0 \end{array}}\left\| P_{i} \widehat{y}_{0}\right\| _{2}$ by $\min \nolimits _{\begin{array}{c} \lambda _i\in \varLambda ^{-}\\ \mathrm{Re} (\alpha _i)>0 \end{array}}\mathrm{Re}(\alpha _i)\cdot \max \nolimits _{\begin{array}{c} \lambda _i\in \varLambda ^{-}\\ \mathrm{Re}(\alpha _i)>0 \end{array}}\left\| P_{i}\widehat{y}_{0}\right\| _{2}$.

The previous theorem with $c=1$ gives the following important results.

Theorem 5.3

Suppose B holds. Let $\tau =\frac{1}{E_1}$. If

$$\begin{aligned} F(\tau )=0.5820\cdot \alpha \min \limits _{\begin{array}{c} \lambda _i\in \varLambda ^{-} \\ \mathrm{Re}(\alpha _i)>0 \end{array}}\left\| P_{i} \widehat{y}_{0}\right\| _{2}\tau \gg 1, \end{aligned}$$

(5.4)

then $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for $t_n\rho _1\in (0,\tau ]$. In particular, we have $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\ge F(\tau )$ for $t_n\rho _1\in (0,\tau ]$.

Theorem 5.4

Suppose B holds. If

$$\begin{aligned} \alpha \min \limits _{\begin{array}{c} \lambda _i\in \varLambda ^{-} \\ \mathrm{Re}(\alpha _i)>0 \end{array}}\left\| P_{i} \widehat{y}_{0}\right\| _{2}\cdot \frac{1}{E_1}\gg 1, \end{aligned}$$

(5.5)

then $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time.

In the stiff situation, it is expected that if B holds, then (5.5) holds. In fact, $E_1\ll 1$ and it is expected $\vert \alpha \vert $ non-small. So, we can state the following important conclusion.

Conclusion 5.5

Suppose to be in the stiff situation. It is expected that if B holds, then $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time.

The all time lower bound (5.4) of the ratio $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}$ is proportional to $\alpha \tau $. At the end of the interval $[0,\tau ]$ this ratio has a lower bound exponential in $\alpha \tau $.

In fact, by Theorem 5.1, we see that for $t_n\rho _1\in [\kappa \tau ,\tau ]$, where $\kappa \in (0,1]$ is not small,

$$\begin{aligned} \frac{\gamma _{n}}{\gamma _{n}^{\mathrm{long}}} \ge \frac{1}{c\left( 1+g\left( c\right) \right) } \min \limits _{\begin{array}{c} \lambda _i\in \varLambda ^{-}\\ \mathrm{Re}(\alpha _i)>0 \end{array}}\left\| P_{i}\widehat{y}_{0}\right\| _{2}\frac{\mathrm{e}^{\kappa \alpha \tau }-1}{\kappa }. \end{aligned}$$

Moreover, by Theorem 1.2 we obtain

$$\begin{aligned} \gamma _{n}\ge \frac{1-g(c)}{1+g(c)} \cdot \frac{1}{K_1}\min \limits _{\begin{array}{c} \lambda _i \in \varLambda ^{-} \\ \mathrm{Re}(\alpha _i)>0 \end{array}} \left\| P_{i}\widehat{y}_{0}\right\| _{2} \left( e^{\kappa \alpha \tau }-1\right) . \end{aligned}$$

5.3 The condition C

Although it is expected that C does not hold, we study anyway the condition C since it characterizes the transition between $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time and $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time.

For the condition C, we need a weak form of $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time.

Definition 5.3

Let $S:(0,+\infty )\rightarrow (0,+\infty )$ be a function such that

$$\begin{aligned} S(x)\le x\ \text {for }\ x>0\ \ \text {and} \ \ \lim \limits _{x \rightarrow +\infty }\ S(x)=+\infty . \end{aligned}$$

We say that S-weakly $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time if $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for $t_n\rho _1\in (0,S(\tau )]$.

Here is the definition with a monitor function.

Definition 5.4

Let $S,F:(0,+\infty )\rightarrow (0,+\infty )$ such that

$$\begin{aligned} S(x)\le x\ \text {for }\ x>0,\ \lim \limits _{x \rightarrow +\infty } \ S(x)=+\infty \ \ \text {and}\ \ \lim \limits _{x \rightarrow +\infty } \ F(x)=+\infty . \end{aligned}$$

We say that S-weakly $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time with monitor function F if

$$\begin{aligned} \frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\ge F(\tau ) \ \ \text {for}\ \ t_n\rho _1 \in (0,S(\tau )]. \end{aligned}$$

Remark 5.3

In the two previous definitions, we also allow functions $S,F:[b,+\infty )\rightarrow [0,+\infty )$, where $0<b<+\infty $.

Thus:

$$\begin{aligned}&\text {if}\ S\text {-weakly}\ \frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1 \ \text {for all time with monitor function}\ F \ \text {and}\ F(\tau )\gg 1,\\&\text {then}\ S\text {-weakly}\ \frac{\gamma _n}{\gamma _n^{\mathrm{long}}} \gg 1\ \text {for all time. In particular, we have} \ \frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg F(\tau )\\&\text {for}\ t_n\rho _1\in (0,S(\tau )]. \end{aligned}$$

The next theorem explains what happens under the condition C.

Theorem 5.6

Assume $q>1$. Fix $c>0$. Let $v\in (0,1)$ and let

$$\begin{aligned} S(x)=x^v,\ x\ge 1. \end{aligned}$$

If C holds, then S-weakly $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time with monitor function

$$\begin{aligned}&F(x)= \frac{1}{c\left( 1+g\left( c\right) \right) } (1-e^{\beta _2})\max \limits _{\begin{array}{c} \lambda _i\in \varLambda ^{-} \\ \alpha _i=0 \end{array}}\left\| P_{i} \widehat{y}_{0}\right\| _{2} \cdot x^{1-v} \end{aligned}$$

(5.6)

defined for $x\ge 1$.

Proof

Let C holds. Let $S\in (0,\tau ]$. For $t_{n}\rho _1\in (0,S]$, in (5.1) we have

$$\begin{aligned} \frac{\left| \mathrm{e}^{\alpha _{i}t_{n}\rho _1} -\mathrm{e}^{\beta _{j}t_{n}\rho _1 }\right| }{t_{n}\rho _1 } \ge \frac{\mathrm{e}^{\mathrm{Re}(\alpha _{i})t_{n}\rho _1 } -\mathrm{e}^{\beta _{j}t_{n}\rho _1 } }{t_{n}\rho _1 } \end{aligned}$$

If $\mathrm{Re}(\alpha _{i})<0$, we have

$$\begin{aligned} \frac{\mathrm{e}^{\mathrm{Re}(\alpha _{i})t_{n}\rho _1} -\mathrm{e}^{\beta _{j}t_{n}\rho _1 } }{t_{n}\rho _1} \le \frac{1-\mathrm{e}^{\beta _{j}t_{n}\rho _1 }}{t_{n}\rho _1 }. \end{aligned}$$

If $\mathrm{Re}(\alpha _{i})=0$, we have

$$\begin{aligned} \frac{\mathrm{e}^{\mathrm{Re}(\alpha _{i})t_{n}\rho _1} -\mathrm{e}^{\beta _{j}t_{n}\rho _1}}{t_{n}\rho _1} =\frac{1-\mathrm{e}^{\beta _{j}t_{n}\rho _1}}{t_{n}\rho _1 }. \end{aligned}$$

Thus, by (5.1), we obtain

$$\begin{aligned} \frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\ge & {} \max \limits _{\lambda _i\in \varLambda ^{-}} \frac{\left( \mathrm{e}^{\mathrm{Re}(\alpha _{i})t_{n}\rho _1} -\mathrm{e}^{\beta _{j}t_{n}\rho _1 }\right) \tau }{t_{n}\rho _1} \cdot \frac{1}{c\left( 1+g\left( c\right) \right) }\cdot \left\| P_{i}\widehat{y}_{0}\right\| _{2}\nonumber \\= & {} \max \limits _{\begin{array}{c} \lambda _i\in \varLambda ^{-}\\ \mathrm{Re}(\alpha _{i})=0 \end{array}} \frac{\left( 1-\mathrm{e}^{\beta _{j} t_n\rho _1}\right) \tau }{t_{n}\rho _1 }\cdot \frac{1}{c \left( 1+g\left( c\right) \right) }\cdot \left\| P_{i}\widehat{y}_{0}\right\| _{2}\nonumber \\\ge & {} \max \limits _{\begin{array}{c} \lambda _i\in \varLambda ^{-}\\ \mathrm{Re}(\alpha _{i})=0 \end{array}}\frac{\left( 1-\mathrm{e}^{\beta _{j}S}\right) \tau }{S}\cdot \frac{1}{c\left( 1+g\left( c\right) \right) }\cdot \left\| P_{i}\widehat{y}_{0}\right\| _{2} \end{aligned}$$

(5.7)

for $t_n\rho _1\in (0,S]$. In particular, for $S=\tau ^v$, in (5.7) we have

$$\begin{aligned} \frac{\left( 1-\mathrm{e}^{\beta _j S }\right) \tau }{S} =\left( 1-\mathrm{e}^{\beta _j\tau ^v}\right) \tau ^{1-v} \ge \left( 1-\mathrm{e}^{\beta _j}\right) \tau ^{1-v} \end{aligned}$$

whenever $\tau \ge 1$. Then

$$\begin{aligned} \frac{\gamma _n}{\gamma _n^{\mathrm{long}}} \ge \max \limits _{\begin{array}{c} \lambda _i\in \varLambda ^{-}\\ \mathrm{Re}(\alpha _i)=0 \end{array}}\left( 1-\mathrm{e}^{\beta _j}\right) \tau ^{1-v}\frac{1}{c\left( 1+g\left( c\right) \right) } \left\| P_{i}\widehat{y}_{0}\right\| _{2}\ge F(\tau ) \end{aligned}$$

for $t_n\rho _1\le (0,\tau ^v]$, where F is the functions in (5.6). $\square $

The previous theorem with $c=1$ gives the following results.

Theorem 5.7

Suppose C holds. Let $\tau =\frac{1}{E_1}$. Let $v\in \left( 0,1\right) $. If

$$\begin{aligned} F(\tau )=0.5820\cdot \left( 1-\mathrm{e}^{\beta _2}\right) \max \limits _{\begin{array}{c} \lambda _i\in \varLambda ^{-}\\ \mathrm{Re}(\alpha _i)=0 \end{array}}\left\| P_{i} \widehat{y}_{0}\right\| _{2}\tau ^{1-v}\gg 1 \end{aligned}$$

then $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for $t_n\rho _1\in (0,\tau ^{v}]$. In particular, we have $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\ge F(\tau )$ for $t_n\rho _1\in (0,\tau ^{v}]$.

Theorem 5.8

Suppose C holds. Let $v\in \left( 0,1\right) $. If

$$\begin{aligned} \left( 1-\mathrm{e}^{\beta _2}\right) \max \limits _{\begin{array}{c} \lambda _i\in \varLambda ^{-}\\ \mathrm{Re}(\alpha _i)=0 \end{array}}\left\| P_{i} \widehat{y}_{0}\right\| _{2}\cdot \frac{1}{E_1^{1-v}}\gg 1, \end{aligned}$$

(5.8)

then S-weakly $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time, where $S(x)=x^v,\ x\ge 1$.

Suppose to be in the stiff situation and suppose that C holds and $E_1^{1-v}\ll 1$. Then it is expected that (5.8) holds. So, we can state the following conclusion.

Conclusion 5.9

Let $S(x)=x^v,\ x\ge 1$, with $v\in (0,1)$ such that $E_1^{1-v}\ll 1$. Suppose to be in the stiff situation and suppose that C holds. It is expected S-weakly $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time.

6 Examples revisited

Now, we look at the two examples of Sect. 2 in the light of the results of Sects. 4 and 5.

6.1 Same approximant with different ODEs

The conditions A, B and C, are $W<1$, $W>1$ and $W=1$, respectively, where

$$\begin{aligned} W=\vert w_2\vert =\mathrm{e}^{-ha}\vert R\left( hb\right) \vert \end{aligned}$$

with $ha=-0.2$. We have

$$\begin{aligned} \begin{array}{|l|l|l|l|} \hline &{} W &{} \alpha =\mathrm{Re}(\alpha _2)=\frac{\log \vert w_2\vert }{h\vert a\vert } &{} \beta _2=\frac{b-a}{\vert a\vert } \\ \hline (\hbox {P}1) &{} 0.00986 &{} -23.1 &{} -10 \\ \hline (\hbox {P}2) &{} 0.387 &{} -4.75 &{} -12.5 \\ \hline (\hbox {P}3) &{} 1.183 &{} 1.19 &{} -15 \\ \hline \end{array} \end{aligned}$$

in the three possibilities for b. With $c=1$, we have

$$\begin{aligned} \tau =\frac{1}{E_1}=1.89\cdot 10^6. \end{aligned}$$

In (P1) and (P2), the condition A holds. The values of s in in (4.15) relevant to $k=3$, i.e. $\varepsilon =4.98\cdot 10^{-2}$, are:

$$\begin{aligned} s=\left\{ \begin{array}{l} 1.87\ \text {in (P1)}\\ 3.93\ \text {in (P2).} \end{array}\right. \end{aligned}$$

We have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ with degree $\varepsilon $ for $t_n\rho _1=t_n\in [s,\tau ]$ and

$$\begin{aligned} \gamma _n^{\mathrm{long}}\le 1.7183\kappa \frac{s}{\tau } \le \kappa s\cdot 10^{-6}\ll 1 \end{aligned}$$

for $t_n\rho _1\in [0,\kappa s]$, where $\kappa \ge 1$ is not large. We have $\gamma _n\approx \gamma _n^{\mathrm{long}}$in the long-time. Observe that the values s agree with Figs. 2 and 3.

In (P3), the condition B holds. The value of the monitor function (5.4) is $F(\tau )=8.79\cdot 10^5$. We have $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\ge F(\tau )$ for $t_n\rho _1=t_n\in (0,\tau ]$ and so $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time.

6.1.1 The region ${\mathscr {R}}_{hr_1}$

Recall Sect. 4.5.2. The condition A can be stated as

$$\begin{aligned} hb\in {\mathscr {R}}_{ha}={\mathscr {R}}_{-0.2} \end{aligned}$$

or, since $hb<ha$,

$$\begin{aligned} hb\in {\mathscr {R}}_{ha}\cap (-\infty ,ha)={\mathscr {R}}_{-0.2} \cap (-\infty ,-0.2). \end{aligned}$$

The region ${\mathscr {R}}_{-0.2}$ is shown in Fig. 9 (compare with Fig. 5 showing ${\mathscr {R}}_0 =\overset{\circ }{{\mathscr {R}}}$). The part of ${\mathscr {R}}_{-0.2}\cap (-\infty ,-0.2)$ in the white finger corresponds to the sufficient condition $hb\in {\mathscr {S}}^{c}$. Out of the white finger, we have an additional range of values for hb guaranteeing the condition A. The border value for b between the conditions A and B, where the condition C holds, is $b=-15.565$. Observe that we are out of the white finger for $b<-11.887$ and out of the stability region for $b<-16.085$.

6.2 Same ODE with different approximants

The conditions A, B and C, are $W<1$, $W>1$ and $W=1$, respectively, where

$$\begin{aligned} W=\left| w_{3}\right| =\mathrm{e}^{-hr_{1}} \left| R\left( h\lambda _{3}\right) \right| <1 \end{aligned}$$

with $hr_{1}=-0.1$. We have

$$\begin{aligned} \begin{array}{|l|l|l|l|} \hline &{} W=\left| w_3\right| &{} \alpha =\mathrm{Re}(\alpha _3)=\frac{\log \vert w_3\vert }{h\vert \lambda _1\vert } &{} \beta _2=\frac{r_2-r_1}{\vert \lambda _1\vert } \\ \hline \text {Guass RK method} &{} 1.105 &{} 2.12 &{} -\sqrt{2} \\ \hline \text {Radau RK method} &{} 0.0221 &{} -27.0 &{} -\sqrt{2} \\ \hline \end{array} \end{aligned}$$

and, with $c=1$,

$$\begin{aligned} \tau =\frac{1}{E_1}=\left\{ \begin{array}{l} 1.80\cdot 10^6\quad \text {for the Gauss RK method}\\ 2.61\cdot 10^4\quad \text {for the Radau RK method.} \end{array}\right. \end{aligned}$$

For the Gauss RK method, the condition B holds. The value of the monitor function (5.4) is $F(\tau )=3.31\cdot 10^5$. We have $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\ge F(\tau )$ for $t_n\rho _1=\sqrt{2}t_n\in (0,\tau ]$ and so $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time.

For the Radau RK method, the condition A holds. The value of s in (4.15) relevant to $k=3$ is $s=9.25$. We have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ with degree $\varepsilon =4.98\cdot 10^{-2}$ for $t_n\rho _1=\sqrt{2}t_n\in [s,\tau ]$ and

$$\begin{aligned} \gamma _n^{\mathrm{long}}\le 1.7183\kappa \frac{s}{\tau } \le \kappa s\cdot \left\{ \begin{array}{l} 10^{-6}\quad \text {for the Gauss RK method}\\ 10^{-4}\quad \text {for the Radau RK method.} \end{array}\right. \ll 1 \end{aligned}$$

for $t_n\rho _1\in [0,\kappa s]$, where $\kappa \ge 1$ is not large. We have $\gamma _n\approx \gamma _n^{\mathrm{long}}$in the long-time. With reference to Fig. 6, we have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ with degree $\varepsilon $ for $t_n\in [8,10]$, i.e. for $t_n\rho =\sqrt{2}t_n\in [11.31,14.14]$.

6.2.1 The region ${\mathscr {R}}_{hr_1}$

The condition A can be written as

$$\begin{aligned} h\lambda _{3}\in {\mathscr {R}}_{hr_1}={\mathscr {R}}_{-0.1}. \end{aligned}$$

The region ${\mathscr {R}}_{-0.1}$ for the two methods is shown in Fig. 10. In the left part of the figure, we see that the region for the Gauss RK method does not cover points with large imaginary part on the line

$$\begin{aligned} \mathrm{Re}(z)=hr_{2}=\mathrm{Re}(h\lambda _{3})=-0.3. \end{aligned}$$

On the other hand, in the right part, we see that the region for the Radau RK method completely includes this line.

7 Independence of the non-rightmost spectrum

In this section, we study when the condition A holds independently of the particular non-rightmost spectrum $\varLambda ^{-}$.

Here, we consider an analytic approximant R with domain ${\mathscr {D}}$ such that $\{z\in {\mathbb {C}}:\mathrm{Re}\left( z\right) < \beta _R\}\subseteq {\mathscr {D}}$ for some $\beta _R\in (0,+\infty ]$, i.e. ${\mathscr {D}}$ includes a left half-plane.

7.1 The property A(x)

We introduce the property $\mathrm{A}(x)$ of the approximant R.

Definition 7.1

Let $x<\beta _R$. Let

$$\begin{aligned} \mathrm{A}(x)\ \ \overset{\mathrm{def}}{\Longleftrightarrow } \ \ \mathrm{e}^{-x}\left| R\left( z\right) \right|<1\text { for all }z\in {\mathbb {C}}\text { such that } \mathrm{Re}\left( z\right) <x, \end{aligned}$$

where $\overset{\mathrm{def}}{\Longleftrightarrow }$ has the meaning of “if and only if” by definition.

The property $\mathrm{A}(x)$ can be also written as

$$\begin{aligned} \{z\in {\mathbb {C}}:\mathrm{Re}\left( z\right) <x\}\subseteq {\mathscr {R}}_x, \end{aligned}$$

where ${\mathscr {R}}_{x}$ is the region defined in (4.18). Observe that A(0) is the A-stability property.

The property $\mathrm{A}(x)$ is important because $\mathrm{A}(hr_{1})$ implies the condition A for all non-rightmost spectra $\varLambda ^{-}$.

It is of interest to consider the property $\mathrm{A}(x)$ for $\vert x\vert $ non-large. In fact, $\vert hr_{1}\vert \le h\rho _1$ and we are assuming $h\rho _1$ non-large.

We have the following negative result.

Theorem 7.1

There exists $x_0>0$ such that, for $x<\beta _R$ with $\vert x\vert \le x_0$ and $x\ne 0$, $\mathrm{A}(x)$ is not true.

Proof

Remind that l is the order of the approximant R. In the complex plane, there exists a small disk centered at the origin which consists of $l+1$ sectors of width $\frac{\pi }{l+1}$ included in the order star ${\mathscr {S}}$, intercalated with $l+1$ sectors of width $\frac{\pi }{l+1}$ included in $ {\mathscr {S}}^{c}$. Thus, there exists $x_0>0$ such that, for $x<\beta _R$ with $\vert x\vert \le x_0$ and $x\ne 0$, the line $\mathrm{Re}(z)=x$ has a non-empty intersection with the order star ${\mathscr {S}}$. Let w be a point in this intersection. We have $\mathrm{Re}\left( w\right) =x$ and

$$\begin{aligned} \mathrm{e}^{-x} \left| R\left( w\right) \right| >1. \end{aligned}$$

Then, due to the continuity of R, there exists $\varepsilon >0$ such that, for any $z\in {\mathbb {C}}$ with $x-\varepsilon \le \mathrm{Re} \left( z\right) \le x$ and $\mathrm{Im}\left( z\right) =\mathrm{Im} \left( w\right) $ , we have

$$\begin{aligned} \mathrm{e}^{-x}\left| R\left( z\right) \right| >1. \end{aligned}$$

$\square $

7.2 Non-significant eigenvalues I

The previous Theorem 7.1 says that, for any $x<\beta _R$ with $\vert x\vert \le x_0$ and $x\ne 0$, there exists $z\in {\mathbb {C}}$ with $\mathrm{Re}(z)<x$ such that $\mathrm{e}^{-x}\vert R(z)\vert \ge 1$. So, we can have, for some normal matrix A, a situation where the rightmost real part is $r_{1}=\frac{x}{h}$ and $\lambda _{i}=\frac{z}{h}$ is a non-rightmost eigenvalue. For this eigenvalue we have

$$\begin{aligned} \vert w_{i}\vert =\mathrm{e}^{-hr_{1}}\vert R(h\lambda _i)\vert \ge 1, \end{aligned}$$

and then the condition A does not hold.

Definition 7.2

We say that a non-rightmost eigenvalue $\lambda _i$ with $\vert w_i\vert \ge 1$ is non-significant (significant) if

$$\begin{aligned} \frac{\vert \sigma _i\vert }{h\rho _1}\ll 1 \ \left( \frac{\vert \sigma _i\vert }{h\rho _1} \ \ \text {is not small}\ \ \right) . \end{aligned}$$

It is expected that any non-rightmost eigenvalue $\lambda _i$ with $\vert w_i\vert \ge 1$ is significant. In fact, by (4.3) we have

$$\begin{aligned} \vert \beta _2 \vert \le \frac{\vert \sigma _i\vert }{h\rho _1}. \end{aligned}$$

and it is expected $\vert \beta _2 \vert $ non-small.

So, the negative result of Theorem 7.1 is not disastrous. The theorem says that, for any rightmost real part $r_1\ne 0$ with $\vert hr_1\vert \le x_0$, there is a situation where we have a non-rightmost eigenvalue $\lambda _i$ such that $\vert w_i\vert \ge 1$. But, such eigenvalue could be non-significant and, if this is true, then it is expected that such a situation does not happen.

In Sect. 7.10 below, we will introduce a condition on the approximant under which any non-rightmost eigenvalue $\lambda _i$ with $\vert w_i\vert \ge 1$ is non-significant.

7.3 The properties $\mathrm{A}(x,a)$ and $\mathrm{B}(x,a)$

It is expected that any non-rightmost eigenvalue $\lambda _i$ with $\left| w_{i}\right| \ge 1$ has $\vert h\lambda _i\vert $ non-small. In fact, it is expected that $\lambda _i$ is significant, i.e. it is expected that

$$\begin{aligned} \frac{\vert \sigma _i\vert }{h\rho _1} =\frac{\vert \lambda _i\vert }{\rho _1}|C| \vert h\lambda _i \vert ^{l}\left( 1+O\left( \vert h \lambda _i \vert \right) \right) \end{aligned}$$

is not small, and then it is “unlikely” to have $\vert h\lambda _i\vert $ small.

Thus, we look at condition A for a non-rightmost spectrum $\varLambda ^{-}$ with all the eigenvalues $\lambda _i$ such that $\vert h\lambda _i\vert $ is not small. In this context, the following two properties of the approximant R are important.

Definition 7.3

Let $x<\beta _{R}$ and let $a\ge 0$. Let

$$\begin{aligned}&\mathrm{A}(x,a)\ \ \overset{\mathrm{def}}{\Longleftrightarrow }\ \ \mathrm{e}^{-x}\left| R\left( z\right) \right|<1\text { for all }z\in {\mathbb {C}}\text { such that }\mathrm{Re}\left( z\right)<x \text { and}\ \left| z\right| \ge a\\&\mathrm{B}(x,a)\ \ \overset{\mathrm{def}}{\Longleftrightarrow }\ \ \mathrm{e}^{-x}\left| R\left( z\right) \right| >1\text { for all }z\in {\mathbb {C}}\text { such that }\mathrm{Re}\left( z\right) <x\text { and}\ \left| z\right| \ge a. \end{aligned}$$

The properties $\mathrm{A}(x,a)$ and $\mathrm{B}(x,a)$ can be also written as

$$\begin{aligned} \{z\in {\mathbb {C}}:\mathrm{Re}\left( z\right)<x\ \text {and}\ |z|\ge a\} \subseteq {\mathscr {R}}_{x}\ \text {and}\ \{z\in {\mathbb {C}}:\mathrm{Re} \left( z\right) <x\ \text {and}\ |z|\ge a\} \subseteq \overset{\circ }{{\mathscr {R}}_x^c}, \end{aligned}$$

respectively, where $\overset{\circ }{{\mathscr {R}}_x^c}$ is the interior of the complementary set ${\mathscr {R}}_x^c$ of ${\mathscr {R}}_x$.

Observe that $\mathrm{A}(x)$ is $\mathrm{A}(x,0)$ and

$$\begin{aligned} \mathrm{A}(x,a_{1})\Rightarrow \mathrm{A}(x,a_{2}) \ \ \text {and}\ \ \mathrm{B}(x,a_{1})\Rightarrow \mathrm{B}(x,a_{2})\ \ \text {if}\ \ a_{1}<a_{2}. \end{aligned}$$

The properties $\mathrm{A}(x,a)$ and $\mathrm{B}(x,a)$ are important because $\mathrm{A}(hr_{1},a)$ implies the condition A for all non-rightmost spectra $\varLambda ^{-}$ such that $h\mu ^{-}\ge a$ and $\mathrm{B}(hr_{1},a)$ implies the condition B for all non-rightmost spectra $\varLambda ^{-}$ such that $h\rho ^{-}\ge a$. Remind that $\mu ^{-}$ and $\rho ^{-}$ are defined in (1.5).

7.4 The limit L

Now, we assume that

$$\begin{aligned} L:=\lim _{z\rightarrow \infty }\vert R(z)\vert \end{aligned}$$

exists. In addition, we also assume the following.

When $L<+\infty $:
$$\begin{aligned} \left| \ \left| R(z)\right| -L\ \right| =O\left( \frac{1}{\vert z\vert ^k}\right) , \ \vert z\vert \rightarrow +\infty , \end{aligned}$$
where $k>0$, and, for any $x<\beta _R$ and $D\ge 0$,
$$\begin{aligned} \vert \ \vert R(z)\vert -L\ \vert \le \frac{C}{\vert z\vert ^k} \text { for }\mathrm{Re}(z)<x\ \text {and}\ \vert z\vert \ge D, \end{aligned}$$

(7.1)
where $C=C(x,D)\ge 0$.
When $L=+\infty $:
$$\begin{aligned} \frac{1}{\left| \ R(z)\right| } =O\left( \frac{1}{\vert z\vert ^{k}}\right) , \ \vert z\vert \rightarrow +\infty , \end{aligned}$$
where $k>0$, and, for any $x<\beta _R$ and $D\ge 0$,
$$\begin{aligned} \vert R(z)\vert \ge C\vert z\vert ^{k}\text { for } \mathrm{Re}(z)<x\ \text {and}\ \vert z\vert \ge D, \end{aligned}$$

(7.2)
where $C=C(x,D)>0$.

The next two subsections consider, for $x<\beta _R$, the cases $L>\mathrm{e}^x$ and $L<\mathrm{e}^x$.

7.5 The case $L>\mathrm{e}^x$

Theorem 7.2

Let $x<\beta _R$. Suppose $L>\mathrm{e}^{x}$. For any $\theta \in \left( 1,\mathrm{e}^{-x}L\right) $ there exists $a\ge 0$ such that

$$\begin{aligned} \mathrm{e}^{-x}\left| R\left( z\right) \right| \ge \theta \text { for all }z\in {\mathbb {C}}\text { such that }\mathrm{Re} \left( z\right) <x\text { and}\ \left| z\right| \ge a. \end{aligned}$$

(7.3)

(Compare with the definition of $\mathrm{B}(x,a)$ given above). We have (7.3) for

$$\begin{aligned} a=\left\{ \begin{array}{l} \inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{L-\theta \mathrm{e}^x}\right) ^{\frac{1}{k}},D\right\} \ \text {if}\ L<+\infty \\ \\ \inf \limits _{D\ge 0}\max \left\{ \left( \frac{\theta \mathrm{e}^x}{C} \right) ^{\frac{1}{k}},D\right\} \ \text {if}\ L=+\infty . \end{array}\right. \end{aligned}$$

(7.4)

Proof

Let $\theta \in (1,\mathrm{e}^{-x}L)$. Since

$$\begin{aligned} \lim _{z\rightarrow \infty }\mathrm{e}^{-x}\vert R(z) \vert =\mathrm{e}^{-x}L>\theta , \end{aligned}$$

we have

$$\begin{aligned} \mathrm{e}^{-x}\vert R(z)\vert \ge \theta \end{aligned}$$

for $\vert z \vert $ sufficiently large.

About (7.4), fix $D\ge 0$. Under the assumptions (7.1) or (7.2), we have, for $\mathrm{Re}(z)<x$ and $\vert z\vert \ge D$,

$$\begin{aligned} \mathrm{e}^{-x}\vert R(z)\vert \ge \theta \end{aligned}$$

whenever

$$\begin{aligned} \mathrm{e}^{-x}\left( L- \frac{C}{\vert z\vert ^k}\right) \ge \theta \ \ \text {or}\ \ \mathrm{e}^{-x}C\vert z\vert ^{k} \ge \theta , \end{aligned}$$

i.e.

$$\begin{aligned} \vert z\vert \ge \left( \frac{C}{L-\theta \mathrm{e}^x}\right) ^{\frac{1}{k}} \ \ \text {or}\ \ \vert z\vert \ge \left( \frac{\theta \mathrm{e}^x}{C} \right) ^{\frac{1}{k_1}}. \end{aligned}$$

Now (7.4) immediately follows. $\square $

Remark 7.1

Consider $L<+\infty $. If $C=C(x,D)$ is a decreasing function of D, and this is obtained for example by considering the “optimal”

$$\begin{aligned} C=\sup \limits _{\begin{array}{c} \mathrm{Re}(z)<x\\ \vert z\vert \ge D \end{array}} \vert \ \vert R(z) \vert -L\ \vert \cdot \vert z\vert ^k, \end{aligned}$$

then $a=\overline{D}$ in (7.4), where $\overline{D}\ge 0$ is such that

$$\begin{aligned} \left( \frac{C(x,\overline{D})}{L-\theta \mathrm{e}^x}\right) ^{\frac{1}{k}} =\overline{D}. \end{aligned}$$

A similar observation applies to the case $L=+\infty $.

Theorem 7.2 has two important consequences given in Theorems 7.3 and 7.4.

Theorem 7.3

Let $x<\beta _R$. If $L>\mathrm{e}^x$, then $\mathrm{B}(x,a)$ for

$$\begin{aligned} a>\left\{ \begin{array}{l} \inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{L- \mathrm{e}^x} \right) ^{\frac{1}{k}},D\right\} \ \text {if}\ L<+\infty \\ \\ \inf \limits _{D\ge 0}\max \left\{ \left( \frac{\mathrm{e}^x}{C} \right) ^{\frac{1}{k}},D\right\} \ \text {if}\ L=+\infty . \end{array}\right. \end{aligned}$$

Proof

For any $\theta \in \left( 1,\mathrm{e}^{-x}L\right) $, Theorem 7.2 says that we have $\mathrm{B}(x,a)$ for

$$\begin{aligned} a\ge \left\{ \begin{array}{l} \inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{L-\theta \mathrm{e}^x}\right) ^{\frac{1}{k}},D\right\} \ \text {if}\ L<+\infty \\ \\ \inf \limits _{D\ge 0}\max \left\{ \left( \frac{\theta \mathrm{e}^x}{C}\right) ^{\frac{1}{k}},D\right\} \ \text {if}\ L=+\infty . \end{array}\right. \end{aligned}$$

So, we have $\mathrm{B}(x,a)$ for

$$\begin{aligned} a> & {} \inf \limits _{\theta \in (1,\mathrm{e}^{-x}L)} \left\{ \begin{array}{l} \inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{L-\theta \mathrm{e}^x}\right) ^{\frac{1}{k}},D\right\} \quad \text {if}\ L<+\infty \\ \\ \inf \limits _{D\ge 0}\max \left\{ \left( \frac{\theta \mathrm{e}^x}{C}\right) ^{\frac{1}{k}},D\right\} \quad \text {if}\ L=+\infty . \end{array}\right. \\= & {} \left\{ \begin{array}{l} \inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{L- \mathrm{e}^x} \right) ^{\frac{1}{k}},D\right\} \quad \text {if}\ L<+\infty \\ \\ \inf \limits _{D\ge 0}\max \left\{ \left( \frac{\mathrm{e}^x}{C} \right) ^{\frac{1}{k}},D\right\} \quad \text {if}\ L=+\infty . \end{array}\right. \end{aligned}$$

$\square $

Theorem 7.4

If $L>\mathrm{e}^{hr_{1}}$, then for any $\theta \in \left( 1,\mathrm{e}^{-hr_{1}}L\right) $ and for any non-rightmost spectrum $\varLambda ^{-}$ satisfying $h\rho ^{-}\ge a$, where a is given in (7.4) with $x=hr_1$, the condition B holds with

$$\begin{aligned} \alpha \ge \frac{\log \theta }{h\rho _1}. \end{aligned}$$

Moreover,

$$\begin{aligned} \alpha \le -\frac{r_{1}}{\rho _1}+\frac{\log L_{\mathrm{sup}}}{h\rho _1}. \end{aligned}$$

where $L_{\mathrm{sup}}=\sup \limits _{\mathrm{Re}(z)<hr_1}\vert R(z)\vert $.

Proof

Suppose $h\rho ^{-}\ge a$. For a non-rightmost eigenvalue $\lambda _i$ of maximum modulus we have $\vert h\lambda _i\vert \ge a$ and then

$$\begin{aligned} \left| w_i\right| =\mathrm{e}^{-hr_1}\vert R(h\lambda _i) \vert \ge \theta \end{aligned}$$

by Theorem 7.2. Thus

$$\begin{aligned} \alpha \ge \mathrm{Re}(\alpha _i)=\frac{\log \vert w_i\vert }{h\rho _1} \ge \frac{\log \theta }{h\rho _1}. \end{aligned}$$

Moreover, for any non-rightmost eigenvalue $\lambda _i$, we have

$$\begin{aligned} \left| w_i\right| =\mathrm{e}^{-hr_1}\vert R(h\lambda _i) \vert \le \mathrm{e}^{-hr_1}L_{\mathrm{sup}}. \end{aligned}$$

Thus,

$$\begin{aligned} \alpha =\max \limits _{\lambda _i\in \varLambda ^{-}}\mathrm{Re} (\alpha _i)=\max \limits _{\lambda _i\in \varLambda ^{-}} \frac{\log \vert w_i\vert }{h\rho _1}\le \frac{\log (\mathrm{e}^{-hr_1}L_{\mathrm{sup}})}{h\rho _1} =-\frac{r_{1}}{\rho _1}+\frac{\log L_{\mathrm{sup}}}{h\rho _1}. \end{aligned}$$

$\square $

Observe that, by varying $\theta $ in $\left( 1,\mathrm{e}^{-hr_{1}} L\right) $, the lower bound $\frac{\log \theta }{h\rho _1}$ of $\alpha $ can be arbitrarily close from below to the positive number

$$\begin{aligned} -\frac{r_{1}}{\rho _1}+\frac{\log L}{h\rho _1}. \end{aligned}$$

If, in addition, $L=L_{\mathrm{sup}}$, then $\alpha $ is not larger than this positive number and $\alpha $ can be arbitrarily close to it.

7.6 The case $L<\mathrm{e}^x$

Theorem 7.5

Let $x< \beta _R$. Suppose $L <\mathrm{e}^{x}$. For any $\theta \in \left( \mathrm{e}^{-x}L,1\right) $ there exists $a\ge 0$ such that

$$\begin{aligned} \mathrm{e}^{-x}\left| R\left( z\right) \right| \le \theta \text { for all }z\in {\mathbb {C}}\text { such that }\mathrm{Re} \left( z\right) <x\text { and}\ \left| z\right| \ge a. \end{aligned}$$

(7.5)

(Compare with the definition of $\mathrm{A}(x,a)$ given above). We have (7.5) for

$$\begin{aligned} a=\inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{\theta \mathrm{e}^x-L}\right) ^{\frac{1}{k}},D\right\} . \end{aligned}$$

(7.6)

Proof

Let $\theta \in (\mathrm{e}^{-x}L,1)$. Since

$$\begin{aligned} \lim _{z\rightarrow \infty }\mathrm{e}^{-x}\vert R(z) \vert =\mathrm{e}^{-x}L<\theta , \end{aligned}$$

we have

$$\begin{aligned} \mathrm{e}^{-x}\vert R(z)\vert \le \theta . \end{aligned}$$

for $\vert z\vert $ sufficiently large.

About (7.6), observe that, under the assumption (7.1), we have, for $\mathrm{Re}(z)<x$ and $\vert z\vert \ge D$,

$$\begin{aligned} \mathrm{e}^{-x}\vert R(z)\vert \le \theta \end{aligned}$$

whenever

$$\begin{aligned} \mathrm{e}^{-x}\left( L+ \frac{C}{\vert z\vert ^k}\right) \le \theta , \end{aligned}$$

i.e.

$$\begin{aligned} \vert z\vert \ge \left( \frac{C}{\theta \mathrm{e}^x-L}\right) ^{\frac{1}{k}}. \end{aligned}$$

Now (7.6) immediately follows. $\square $

Remark 7.2

An observation about a in (7.6), similar to the observation of Remark 7.1 can be done.

Theorem 7.5 has two important consequences given in and Theorems 7.6 and 7.7.

Theorem 7.6

Let $x<\beta _R$. If $L<\mathrm{e}^x$, then $\mathrm{A}(x,a)$ for

$$\begin{aligned} a>\inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{\mathrm{e}^x-L}\right) ^{\frac{1}{k}},D\right\} . \end{aligned}$$

Proof

For any $\theta $ in $\left( \mathrm{e}^{-x}L,1\right) $, Theorem 7.5 says that we have $\mathrm{A}(x,a)$ for

$$\begin{aligned} a\ge \inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{\theta \mathrm{e}^x-L}\right) ^{\frac{1}{k}},D\right\} . \end{aligned}$$

So, we have $\mathrm{A}(x,a)$ for

$$\begin{aligned} a>\inf \limits _{\theta \in (\mathrm{e}^{-x}L,1)} \inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{\theta \mathrm{e}^x-L}\right) ^{\frac{1}{k}},D\right\} =\inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{\mathrm{e}^x-L}\right) ^{\frac{1}{k}},D\right\} . \end{aligned}$$

$\square $

Theorem 7.7

If $L<\mathrm{e}^{hr_{1}}$, then for any $\theta \in \left( \mathrm{e}^{-hr_{1}}L,1 \right) $ and for any non-rightmost spectrum $\varLambda ^{-}$ satisfying $h\mu ^{-}\ge a$, where a is given in (7.6) with $x=hr_1$, the condition A holds with

$$\begin{aligned} \alpha \le \frac{\log \theta }{h\rho _1}. \end{aligned}$$

Moreover,

$$\begin{aligned} \alpha \ge -\frac{r_{1}}{\rho _1}+\frac{\log L_{\mathrm{inf}}}{h\rho _1}. \end{aligned}$$

where $L_{\mathrm{inf}}=\inf \nolimits _{\mathrm{Re}(z)<hr_1}\vert R(z)\vert $.

Proof

Suppose $h\mu ^{-}\ge a$. For a non-rightmost eigenvalues $\lambda _i$ such that $\alpha =\mathrm{Re}(\alpha _i)$ we have $\vert h\lambda _i\vert \ge a$ and then

$$\begin{aligned} \left| w_i\right| =\mathrm{e}^{-hr_1} \vert R(h\lambda _i)\vert \le \theta \end{aligned}$$

by Theorem 7.5. Thus,

$$\begin{aligned} \alpha =\mathrm{Re}(\alpha _i)=\frac{\log \vert w_i\vert }{h\rho _1} \le \frac{\log \theta }{h\rho _1}. \end{aligned}$$

Moreover, for any non-rightmost eigenvalue $\lambda _i$ we have

$$\begin{aligned} \left| w_i\right| =\mathrm{e}^{-hr_1}\vert R(h\lambda _i) \vert \ge \mathrm{e}^{-hr_1}L_{\mathrm{inf}}. \end{aligned}$$

Thus,

$$\begin{aligned} \alpha =\max \limits _{\lambda _i\in \varLambda ^{-}} \mathrm{Re}(\alpha _i)=\max \limits _{\lambda _i\in \varLambda ^{-}}\frac{\log \vert w_i\vert }{h\rho _1} \ge \frac{\log (\mathrm{e}^{-hr_1}L_{\mathrm{inf}})}{h\rho _1}= -\frac{r_{1}}{\rho _1}+\frac{\log L_{\mathrm{inf}}}{h\rho _1}. \end{aligned}$$

Observe that, by varying $\theta $ in $\left( \mathrm{e}^{-hr_{1}}L,1 \right) $, the upper bound $\frac{\log \theta }{h\rho _1}$ of $\alpha $ can be arbitrarily close from above to the negative number

$$\begin{aligned} -\frac{r_{1}}{\rho _1}+\frac{\log L}{h\rho _1}. \end{aligned}$$

If, in addition, $L=L_{\mathrm{inf}}$, then $\alpha $ is not smaller than this negative number and $\alpha $ can be arbitrarily close to it.

7.7 Approximants with $L=+\infty $

Consider approximants with $L=+\infty $. Examples of such approximants are Taylor approximants and superdiagonal Padé approximants.

The results in Sect. 7.5 say that the condition B holds for $h\rho ^{-}$ sufficiently away from zero, as confirmed in the first example of Sect. 2. In particular, B holds for

$$\begin{aligned} h\rho ^{-}>\inf \limits _{D\ge 0} \max \left\{ \left( \frac{\mathrm{e}^{hr_1}}{C}\right) ^{\frac{1}{k}},D\right\} . \end{aligned}$$

As $h\rho ^{-}\rightarrow +\infty $, B holds with $\alpha \rightarrow +\infty $.

7.8 Approximants with $L=0$

Consider approximants with $L=0$. Examples of such approximants are subdiagonal Padé approximants. Radau e Lobatto IIIC RK methods correspond to the first and second subdiagonal Padé approximants, respectively.

The results in Sect. 7.6 say that the condition A holds for $h\mu ^{-}$ sufficiently away from zero. In particular, A holds for

$$\begin{aligned} h\mu ^{-}>\inf \limits _{D\ge 0}\max \left\{ \left( \mathrm{e}^{-hr_1} C\right) ^{\frac{1}{k}},D\right\} . \end{aligned}$$

As $h\mu ^{-}\rightarrow +\infty $, A holds with $\alpha \rightarrow -\infty $. Moreover, Theorem 7.1 says that, for any rightmost real part $r_1\ne 0$ with $\vert hr_1\vert \le x_0$, we cannot have that A holds for all $h\mu ^{-}$.

A-stable approximants with $L=0$ are called L-stable (see [5]) and they are considered particularly suitable for integrating very stiff ODEs (see [1, 3, 8, 11]). Observe that here we are also considering approximants with $L=0$ which are not A-stable. Indeed, the A-stability property does not play a crucial role in this context. Among subdiagonal Padé approximants, only the first and second subdiagonal Padé approximants (Radau and Lobatto IIIC methods) are A-stable.

7.9 Approximants with $L=1$

Consider approximants with $L=1$. Examples of approximants with $L=1$ are diagonal Pad é approximants, which are also A-stable. Gauss methods correspond to the diagonal Padé approximants.

Suppose $r_{1}<0$. The results in Sect. 7.5 say that the condition B holds for $h\rho ^{-}$ sufficiently away from zero, as confirmed in the second example of Sect. 2. In particular, B holds for

$$\begin{aligned} h\rho ^{-}>\inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{1-\mathrm{e}^{hr_1}}\right) ^{\frac{1}{k}},D\right\} . \end{aligned}$$

For an A-stable approximant, B holds with $\alpha \le -\frac{r_{1}}{\rho _1}$ and, as $h\rho ^{-}\rightarrow +\infty $, $\alpha \rightarrow -\frac{r_{1}}{\rho _1}$.

Suppose $r_1>0$. The results in Sect. 7.6 say that the condition A holds for $h\mu ^{-}$ sufficiently away from zero. In particular, A holds for

$$\begin{aligned} h\mu ^{-}>\inf \limits _{D\ge 0}\max \left\{ \left( \frac{C}{\mathrm{e}^{hr_1} -1}\right) ^{\frac{1}{k}},D\right\} . \end{aligned}$$

For an A-stable approximant, A holds with $\alpha \ge -\frac{r_{1}}{\rho _1}$ and, as $h\mu ^{-}\rightarrow +\infty $, $\alpha \rightarrow -\frac{r_{1}}{\rho _1}$.

7.10 Non-significant eigenvalues II

In this subsection we study when any non-rightmost eigenvalue $\lambda _i$ with $\vert w_i\vert \ge 1$ is non-significant (see Sect. 7.2).

7.10.1 The region ${\mathscr {P}}_x$

For $x<\beta _{R}$, let

$$\begin{aligned} {\mathscr {P}}_{x}:=\{z\in {\mathbb {C}}:\mathrm{Re}\left( z\right) <x\} \cap {\mathscr {R}}_{x}^{c}, \end{aligned}$$

where ${\mathscr {R}}_{x}^{c}$ is the complementary set of ${\mathscr {R}}_x$.

We have $\mathrm{A}(x)$ if and only if ${\mathscr {P}}_{x}=\emptyset $. Moreover, for $a\ge 0$, we have $\mathrm{A}(x,a)$ if and only if the open disk of radius a centered at the origin includes ${\mathscr {P}}_{x}$.

The importance of the region ${\mathscr {P}}_{x}$ is due to the fact that, for a non-rightmost eigenvalue $\lambda _{i}$, we have $|w_{i}|\ge 1$ if and only if $h\lambda _{i}\in {\mathscr {P}}_{hr_{1}}$.

7.10.2 The number a(x)

For $x<\beta _{R}$, let

$$\begin{aligned} a(x):=\inf \{a\ge 0:\mathrm{A}(x,a)\}. \end{aligned}$$

In other words, a(x) is the infimimum of the radii of open disks centered at the origin and including ${\mathscr {P}}_x$.

The importance of the number a(x) is given by the following theorem.

Theorem 7.8

For a non-rightmost eigenvalue $\lambda _i$ such that $\vert w_i\vert \ge 1$, we have $\vert h\lambda _i\vert \le a\left( hr_{1}\right) $.

Proof

The closed disk of radius a(x) centered at the origin includes the region ${\mathscr {P}}_{x}$. The theorem follows by reminding that ${\mathscr {P}}_{x}$ contains the non-rightmost eigenvalues $\lambda _i$ such that $\vert w_i\vert \ge 1$. $\square $

7.10.3 The theorem on the non-significant eigenvalues

Next theorem says when any non-rightmost eigenvalue $\lambda _i$ with $\vert w_i\vert \ge 1$ is non-significant. It involves the behavior of a(x) as $x\rightarrow 0$.

Theorem 7.9

Consider an approximant such that

$$\begin{aligned} a\left( x\right) \le \eta \vert x\vert \left( 1+O\left( x\right) \right) ,\ x\rightarrow 0, \end{aligned}$$

(7.7)

where $\eta >0$. If

$$\begin{aligned} \eta ^{l+1}E_1\left( 1+O\left( h\rho _1 \right) \right) \ll 1, \end{aligned}$$

(7.8)

where l is the order of the approximant, then any non-rightmost eigenvalue $\lambda _{i}$ with $\vert w_i\vert \ge 1$ is non-significant.

Proof

Consider a non-rightmost eigenvalue $\lambda _i$ with $\vert w_i\vert \ge 1$. By Theorem 7.8, we have

$$\begin{aligned} \left| h\lambda _{i}\right| \le a\left( hr_{1}\right) \le \eta \vert hr_{1}\vert \left( 1+O\left( hr_{1}\right) \right) \le \eta h\rho _1\left( 1+O\left( h\rho _1\right) \right) . \end{aligned}$$

Recall (1.6) and (1.7). Since

$$\begin{aligned} \vert \log S(z) \vert =\vert C\vert z^{l+1}(1+O(z)), \end{aligned}$$

we obtain

$$\begin{aligned} \vert \sigma _i \vert&= \vert \log S(h\lambda _i)\vert =\vert C\vert \left( \vert h\lambda _i\vert \right) ^{l+1} \left( 1+O\left( \vert h\lambda _i\vert \right) \right) \\ &\le \vert C\vert \left( \eta h\rho _1\left( 1+O \left( h\rho _1\right) \right) \right) ^{l+1} \left( 1+O\left( h\rho _1 \right) \right) \\&= \eta ^{l+1}\vert C\vert \left( h\rho _1\right) ^{l+1} \left( 1+O\left( h\rho _1 \right) \right) \end{aligned}$$

and then

$$\begin{aligned} \frac{\vert \sigma _i \vert }{h\rho _1}&\le \eta ^{l+1} \vert C\vert \left( h\rho _1\right) ^{l}\left( 1+O \left( h\rho _1 \right) \right) \\ &= \eta ^{l+1}\frac{E_1}{1+O(h\rho _1)}\left( 1+O \left( h\rho _1 \right) \right) \ \text {recall}\ (1.9)\\ &= \eta ^{l+1}E_1\left( 1+O\left( h\rho _1 \right) \right) . \end{aligned}$$

The theorem now follows by reminding the definition of non-significant eigenvalue. $\square $

Remark 7.3

The term $O(h\rho _1)$ in (7.8) is not larger than $C h\rho _1$ for $h\rho _1\le D$, where $C\ge 0$ and $D>0$ depend only on the approximant.

By the previous theorem we obtain the following important conclusion.

Conclusion 7.10

Suppose that the approximant satisfies (7.7). It is expected that A holds.

In fact, suppose A does not hold, i.e. there is a non-rightmost eigenvalue $\lambda _i$ with $\vert w_i\vert \ge 1$. It is expected that this eigenvalue is significant. On the other hand, if it is significant, then, by the previous theorem, we obtain that (7.8) does not hold and this is “unlikely”.

In the next subsection, we show that the implicit Euler method satisfies (7.7).

7.11 The implicit Euler method

We examine the property $\mathrm{A}(x)$ and determine the number a(x) for the the implicit Euler method, corresponding to the $\left( 0,1\right) $-Padé approximant

$$\begin{aligned} R(z)=\frac{1}{1-z},\ z\in {\mathbb {C}}{\setminus }\left\{ 1\right\} . \end{aligned}$$

This approximant has $\beta _R=1$.

The region ${\mathscr {R}}_x$, $x<1$, for this approximant is the exterior of the disk of center 1 and radius $\mathrm{e}^{-x}$ and the region ${\mathscr {P}}_x$ is the part of the closed disk at the left of the line $\mathrm{Re}(z)=x$ (see Fig. 11).

Theorem 7.11

Let $x<1$. For the implicit Euler method, we have $\mathrm{A}(x)$ if and only $x=0$. Moreover, we have

$$\begin{aligned} a(x)=\sqrt{\mathrm{e}^{-2x}-1+2x}. \end{aligned}$$

(7.9)

Proof

When $x=0$, A(x) is the A-stability. When $x\ne 0$, we have $1-\mathrm{e}^{-x}<x$ and then

$$\begin{aligned} {\mathscr {P}}_x=\{z\in {\mathbb {C}}:\mathrm{Re}\left( z\right) <x\} \cap {\mathscr {R}}_x^c\ne \emptyset , \end{aligned}$$

since the complementary set ${\mathscr {R}}_x^c$ of ${\mathscr {R}}_x$ is the closed disk of center 1 and radius $\mathrm{e}^{-x}$ (see Fig. 10). Thus A(x) is not true.

For the second part, let $b\ge 0$. An easy computation shows that, for $z\in {\mathbb {C}}$ such that $\vert z\vert =b$, we have

$$\begin{aligned} z\in {\mathscr {R}}_x^c \ \Leftrightarrow \ \vert z-1\vert \le \mathrm{e}^{-x}\ \Leftrightarrow \ \mathrm{Re}(z) \ge \frac{1}{2}\left( b^2+1-\mathrm{e}^{-x}\right) . \end{aligned}$$

Hence

$$\begin{aligned}&\emptyset \ne \left\{ z\in {\mathbb {C}}:z\in {\mathscr {P}}_x \ \text {and}\ \vert z\vert =b\right\} =\left\{ z\in {\mathbb {C}}: \mathrm{Re}(z)<x\ \text {and}\ \vert z\vert =b\ \text {and}\ z \in {\mathscr {R}}_x^c\right\} \\&\quad =\left\{ z\in {\mathbb {C}}:\vert z\vert =b\ \text {and}\ \frac{1}{2}\left( b^2+1-\mathrm{e}^{-x}\right) \le \mathrm{Re}(z)<x\right\} \end{aligned}$$

if and only if

$$\begin{aligned} \frac{1}{2}\left( b^2+1-\mathrm{e}^{-x}\right) <x\ \ \text {and}\ \ x> -b\ \ \text {and}\ \ \frac{1}{2}\left( b^2+1-\mathrm{e}^{-x}\right) \le b. \end{aligned}$$

(7.10)

For $x>0$, (7.10) is equivalent to

$$\begin{aligned} 1-\mathrm{e}^{-x}\le b<\sqrt{ \mathrm{e}^{-2x}-1+2x}. \end{aligned}$$

For $x<0$, (7.10) is equivalent to

$$\begin{aligned} -x<b<\sqrt{ \mathrm{e}^{-2x}-1+2x}. \end{aligned}$$

Now, equation (7.9) follows. $\square $

By (7.9), we obtain

$$\begin{aligned} a(x)=\sqrt{2}\vert x\vert (1+O(x)),\ x\rightarrow 0. \end{aligned}$$

We can conclude that it is expected that A holds for the implicit Euler method.

8 Conclusions

In the stiff situation, we have studied the long-time behavior of the relative error in the numerical integration of the ODE (1.1) with A normal. The numerical integration is accomplished over a mesh of constant stepsize h, by using at any step of an analytic approximant R of the exponential: see (1.2). The relative error $\gamma _n$ of the numerical integration is given in (1.3).

We have defined the long-time solution $y^{\mathrm{long}}$ as the solution of (1.1) projected on the eigenspace of the rightmost eigenvalues and we have considered the relative error $\gamma _n^{\mathrm{long}}$ of the numerical integration of $y^{\mathrm{long}}$. The error $ \gamma _n^{\mathrm{long}}$ grows linearly in time, it is small and it remains small in the long-time.

We have introduced the condition

$$\begin{aligned} \text {A:} \ \ \ \vert R(h\lambda )\vert <\mathrm{e}^{hr_1} \ \text {for any non-rightmost eigenvalue}\ \lambda \ \text {of}\ A, \end{aligned}$$

where $r_1$ is the real part of the rightmost eigenvalues of A. When A holds, we have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time. When A does not hold, we have $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time.

Let $L=\lim \nolimits _{z\rightarrow \infty }\vert R(z)\vert $. In order to have the condition A satisfied, it is better to use approximant with $L=0$ (for example Radau and Lobatto IIIC methods). Approximants with $L=1$ (for example Gauss methods) does not work well when $r_1<0$.

The paper [10] analyzes the numerical integration in the stiff situation by looking to a different question. In [10], the interest is about numerical approximations (1.2) of the long-time solution starting with a perturbed initial value. The approximants are analyzed by means of their error growth function $\varphi _R$ (see [4, 5]) in order to study how they propagate the initial perturbation from the relative error point of view. In this other context, we have a non-large propagation of the initial perturbation if and only if

$$\begin{aligned} \varphi _R(x)=1+x+o(x),\ x\rightarrow 0. \end{aligned}$$

We have considered the case of A normal. Some numerical experiments, not included here, suggest that also for non-normal matrices we have $\gamma _n\approx \gamma _n^{\mathrm{long}}$ in the long-time when the condition A holds and $\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1$ for all time when A does not hold. In light of this, the results of Sect. 7 becomes more important, since they are about the condition A.

We conclude by remarking that the findings of this paper are interesting in applications involving differential models described by linear ODEs with $r_1\ne 0$. In particular, they are interesting when we are integrating an ODE whose solution decreases to small orders of magnitude (case $r_1<0$), but it is not yet considered as zero, or grows up to a large orders of magnitude (case $r_1>0$), but it is not yet considered as infinite.

Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

previous article Frequency-explicit approximability estimates for time-harmonic Maxwell’s equations

next article Particle number conservation and block structures in matrix product states

Bui, T., Bui, R.: Numerical methods for extremely stiff systems of ordinary differential equations. Appl. Math. Model. 3, 355–358 (1979)CrossRef

Burgisser, F., Cucker, F.: Condition. The Geometry of Numerical Algorithms. Springer, Berlin (2013)CrossRef

Gad, E., Nakhla, M., Achar, R., Zhou, Y.: A-stable and L-stable high-order integration methods for solving stiff differential equations. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 28, 1352–1372 (2009)CrossRef

Hairer, E., Bader, G., Lubich, C.: On the stability of semi-implicit methods for ordinary differential equations. BIT 22, 211–232 (1982)MathSciNetCrossRef

Hairer, E., Wanner, G.: Solving Ordinary Differential Equations II. Stiff Differential-Algebraic Problems, Second Revised Edition. Springer, Berlin (1996)

Hairer, E., Wanner, G.: Order stars and stiff integrators. J. Comput. Appl. Math. 125, 93–105 (2000)MathSciNetCrossRef

Iserles, A., Norsett, S.: Order Stars: Theory and Applications. Chapman and Hall, London (1991)CrossRef

Logsdon, J., Biegler, L.: Accurate solution of differential-algebraic optimization problems. Ind. Eng. Chem. Res. 28, 1628–1639 (1989)CrossRef

Maset, S.: Relative error analysis of matrix exponential approximations for numerical integration. J. Numer. Math. 29(2), 119–158 (2021)MathSciNetCrossRef

10.

Maset, S.: Relative error stability and instability of matrix exponential approximations for stiff numerical integration of long-time solutions. J. Comput. Appl. Math. 390, 113387 (2021)MathSciNetCrossRef

11.

Ropp, D., Shadid, J.: Stability of operator splitting methods for systems with indefinite operators: reaction-diffusion systems. J. Comput. Phys. 203, 449–466 (2005)MathSciNetCrossRef

12.

Shampine, L., Gladwell, I., Thompson, S.: Solving ODEs with MATLAB. Cambridge University Press, Cambridge (2003)CrossRef

13.

Wanner, G., Hairer, E., Norsett, S.: Order stars and stability theorems. BIT 18, 475–489 (1978)MathSciNetCrossRef

Title: Relative error long-time behavior in matrix exponential approximations for numerical integration: the stiff situation
Author: S. Maset
Publication date: 01-06-2022
Publisher: Springer International Publishing
Published in: Calcolo / Issue 2/2022
Print ISSN: 0008-0624
Electronic ISSN: 1126-5434
DOI: https://doi.org/10.1007/s10092-022-00466-5

Springer Professional

Abstract

Publisher's Note

1 Introduction

1.1 Fundamental notations and notions

1.1.1 Small and large

1.1.2 The notation \(\approx \)

1.1.3 The meaning of “it is expected”

1.1.4 The spectrum of A

1.1.5 The initial value \(y_0\)

1.1.6 Rightmost and non-rightmost eigenvalues

1.1.7 The numbers \(\beta _j\)

1.1.8 Dimensionless quantities

1.1.9 The errors \(\sigma _i\)

1.1.10 Local relative errors and global relative errors

1.1.11 The ratios \(K_1\) and K

1.1.12 The ratios \(M_i\) and M

1.1.13 The base situation

1.1.14 The non-stiff situation and the stiff situation

1.1.15 The function g

1.2 Analysis of the error \(\gamma _n\)

1.2.1 The long-time solution

1.2.2 The error \(\gamma _n^{\mathrm{long}}\)

1.2.3 Long-time behavior of \(\gamma _n\)

1.3 The contents of this paper

1.4 Replies to general questions or criticisms

2 Examples

2.1 Same approximant with different ODEs

2.1.1 Order star and stability region

2.2 Same ODE with different approximants

2.2.1 Order star and stability region

3 The appropriate definition of \(\gamma _n\approx \gamma _n^{\mathrm{long}}\) in the long-time

3.1 The definition of \(\gamma _n\approx \gamma _n^{\mathrm{long}}\) in the long-time with monitor function

3.2 What does the definition with monitor function mean?

4 Analysis of the long-time behavior of \(\gamma _n\)

4.1 The numbers \(w_i\)

4.2 The numbers \(\alpha _i\)

4.3 The basic theorem

4.4 A first result

4.5 The condition A

4.5.1 Order star and stability region

4.5.2 The region \({\mathscr {R}}_x\)

4.5.3 The conditions B and C

5 When the condition A does not hold

5.1 Definition of \(\frac{\gamma _n}{\gamma _n^{\mathrm{long}}}\gg 1\) for all time

5.2 The condition B

5.3 The condition C

6 Examples revisited

6.1 Same approximant with different ODEs

6.1.1 The region \({\mathscr {R}}_{hr_1}\)

6.2 Same ODE with different approximants

6.2.1 The region \({\mathscr {R}}_{hr_1}\)

7 Independence of the non-rightmost spectrum

7.1 The property A(x)

7.2 Non-significant eigenvalues I

7.3 The properties \(\mathrm{A}(x,a)\) and \(\mathrm{B}(x,a)\)

7.4 The limit L

7.5 The case \(L>\mathrm{e}^x\)

7.6 The case \(L<\mathrm{e}^x\)

7.7 Approximants with \(L=+\infty \)

7.8 Approximants with \(L=0\)

7.9 Approximants with \(L=1\)

7.10 Non-significant eigenvalues II

7.10.1 The region \({\mathscr {P}}_x\)

7.10.2 The number a(x)

7.10.3 The theorem on the non-significant eigenvalues

7.11 The implicit Euler method

8 Conclusions

Publisher's Note

Other articles of this Issue 2/2022

Virtual element analysis of nonlocal coupled parabolic problems on polygonal meshes

Improved CRI iteration methods for a class of complex symmetric linear systems

Composite symmetric second derivative general linear methods for Hamiltonian systems

Particle number conservation and block structures in matrix product states

Efficient iteration methods for complex systems with an indefinite matrix term

Frequency-explicit approximability estimates for time-harmonic Maxwell’s equations

Premium Partner