1 Introduction

Given a random variable \(X\) with an absolutely continuous density \(p\), the Fisher information of \(X\) (or its distribution) is defined by

$$\begin{aligned} I(X) = I(p) = \int _{-\infty }^\infty \frac{p^{\prime }(x)^2}{p(x)} \, dx, \end{aligned}$$

where \(p^{\prime }\) denotes a Radon–Nikodym derivative of \(p\). In all other cases, let \(I(X) = \infty \).

With the first two moments of \(X\) being fixed, this quantity is minimized for the normal distribution (which is a variant of Cramér–Rao’s inequality). That is, if \(\mathbf{E}X = a\;\mathrm{Var}(X) = \sigma ^2\), then we have \(I(X) \ge I(Z)\) for \(Z \sim N(a,\sigma ^2)\) with density

$$\begin{aligned} \varphi _{a,\sigma }(x) = \frac{1}{\sqrt{2 \pi \sigma ^2}}\ e^{-(x-a)^2/2\sigma ^2}. \end{aligned}$$

Moreover, the equality \(I(X) = I(Z)\) holds if and only if \(X\) is normal.

In many applications the relative Fisher information

$$\begin{aligned} I(X||Z) = I(X) - I(Z) = \int _{-\infty }^\infty \left( \frac{p^{\prime }(x)}{p(x)} - \frac{\varphi _{a,\sigma }^{\prime }(x)}{\varphi _{a,\sigma }(x)}\right) ^2\,p(x)\,dx \end{aligned}$$

is used as a strong measure of non-Gaussianity of \(X\). For example, it dominates the relative entropy, or Kullback-Leibler distance of the distribution of \(X\) to the standard normal distribution; more precisely (cf. Stam [S]),

$$\begin{aligned} \frac{\sigma ^2}{2}\,I(X||Z) \, \ge \, D(X||Z) = \,\int _{-\infty }^\infty p(x) \log \frac{p(x)}{\varphi _{a,\sigma }(x)}\ dx. \end{aligned}$$
(1.1)

We consider the scheme of independent identically distributed random variables \((X_n)_{n \ge 1}\). Assuming that \(\mathbf{E}X_1 = 0\;\mathrm{Var}(X_1) = 1\), define the normalized sums

$$\begin{aligned} Z_n = \frac{X_1 + \cdots + X_n}{\sqrt{n}}. \end{aligned}$$

Since \(Z_n\) are weakly convergent in distribution to \(Z \sim N(0,1)\), one may wonder whether the convergence holds in a stronger sense. A remarkable observation in this respect is due to Barron and Johnson proving in [3] that

$$\begin{aligned} I(Z_n) \rightarrow I(Z), \quad \mathrm{as} \ \ n \rightarrow \infty , \end{aligned}$$
(1.2)

i.e., \(I(Z_n||Z) \rightarrow 0\), if and only if \(I(Z_{n_0})\) is finite for some \(n_0\). In particular, it suffices to require that \(I(X_1) < \infty \), although choosing larger values of \(n_0\) considerably enhances the range of applicability of this theorem.

Quantitative estimates on the relative Fisher information in the central limit theorem are partly developed, as well. In the i.i.d. case Barron and Johnson [3], and Artstein et al. [1] derived an asymptotic bound \(I(Z_n||Z) = O(1/n)\) under the hypothesis that the distribution of \(X_1\) admits an analytic inequality of Poincaré-type

$$\begin{aligned} c\, \mathrm{Var}(u(X_1)) \le \mathbf{E}\, u^{\prime }(X_1)^2. \end{aligned}$$

Here, \(u\) is an arbitrary bounded smooth function on the real line, and \(c\) is a constant depending on the distribution of \(X_1\), only (the spectral gap). More precisely, they established the bound

$$\begin{aligned} I(Z_n||Z) \le \frac{2}{2 + c\,(n-1)}\,I(X_1||Z), \end{aligned}$$

leading to the \(1/n\) convergence in case \(c>0\) and \(I(X_1) < \infty \). The work [1], which brought important ideas from [4], also provides a similar bound for weighted sums of \(X_k\) in terms of the Lyapunov coefficient of order \(4\). Note that Poincaré inequalities involve a large variety of “nice” probability distributions on the line all having finite exponential moments.

One of the aims of this paper is to study the exact asymptotics (or rates) of \(I(Z_n||Z)\) under standard moment conditions. We prove:

Theorem 1.1.

Let \(\mathbf{E}\, |X_1|^s < \infty \) for an integer \(s \ge 2\), and assume that \(I(Z_{n_0}) < \infty \), for some \(n_0\). Then for certain coefficients \(c_j\) we have, as \(n \rightarrow \infty \),

$$\begin{aligned} I(Z_n||Z) = \frac{c_1}{n} + \frac{c_2}{n^2} + \cdots + \frac{c_{[(s-2)/2)]}}{n^{[(s-2)/2)]}} + o\left( n^{-\frac{s-2}{2}} \, (\log n)^{-\frac{(s-3)_+}{2}}\right) . \end{aligned}$$
(1.3)

As it turns out, a similar expansion holds as well for the entropic distance \(D(Z_n||Z)\), cf. [11], showing a number of interesting analogies in the asymptotic behavior of these two distances. In particular, in both cases each coefficient \(c_j\) is given by a certain polynomial in the cumulants \(\gamma _3,\ldots ,\gamma _{2j+1}\). In order to describe these polynomials, we first note that, by the moment assumption, the cumulants

$$\begin{aligned} \gamma _r = i^{-r}\, \frac{d^r}{dt^r} \, \log \mathbf{E}\, e^{itX_1}|_{t=0} \end{aligned}$$

are well-defined for all positive integers \(r \le s\), and one may introduce the well-known functions

$$\begin{aligned} q_k(x) \ = \, \varphi (x)\, \sum H_{k + 2j}(x) \, \frac{1}{r_1!\ldots r_k!}\, \left( \frac{\gamma _3}{3!}\right) ^{r_1} \ldots \left( \frac{\gamma _{k+2}}{(k+2)!}\right) ^{r_k} \end{aligned}$$

involving the Chebyshev-Hermite polynomials \(H_l\). Here \(\varphi = \varphi _{0,1}\) denotes the density of the standard normal law, and the summation runs over all non-negative integer solutions \((r_1,\ldots ,r_k)\) to the equation \(r_1 + 2 r_2 + \cdots + k r_k = k\) with \(j = r_1 + \cdots + r_k\).

The functions \(q_k\) are correctly defined for \(k = 1,\ldots ,s-2\). They appear in Edgeworth-type expansions approximating the density of \(Z_n\). We shall employ them to derive an expansion in powers of \(1/n\) for the distance \(I(Z_n||Z)\), which leads us to the following description of the coefficients in (1.3),

$$\begin{aligned} c_j \, = \, \sum _{k=2}^{2j}\, (-1)^k \sum \int _{-\infty }^{+\infty } (q_{r_1}^{\prime } + x q_{r_1}) (q_{r_2}^{\prime } + x q_{r_2})\, q_{r_3} \ldots q_{r_k}\, \frac{dx}{\varphi ^{k - 1}}. \end{aligned}$$
(1.4)

Here, the inner summation is carried out over all positive integer tuples \((r_1,\ldots ,r_k)\) such that \(r_1 + \cdots + r_k = 2j\).

For example, \(c_1 = \frac{1}{2}\,\gamma _3^2\), and in the case \(s=4\) the relation (1.3) becomes

$$\begin{aligned} I(Z_n||Z) = \frac{1}{2n}\,\left( \mathbf{E}X_1^3\right) ^2 + o\left( \frac{1}{n\, (\log n)^{1/2}}\right) . \end{aligned}$$
(1.5)

Hence, under the 4-th moment condition, we have \(I(Z_n||Z) \le \frac{C}{n}\) with some constant \(C\) (which can actually be chosen to depend on \(\mathbf{E}X_1^4\) and \(I(X_1)\), only).

For \(s=6\), the result involves the coefficient \(c_2\) which depends on \(\gamma _3,\gamma _4\), and \(\gamma _5\). If \(\gamma _3 = 0\) (i.e. \(\mathbf{E}X_1^3 = 0\)), we have \(c_1 = 0\;c_2 = \frac{1}{6}\,\gamma _4^2\), and then

$$\begin{aligned} I(Z_n||Z) = \frac{1}{6 n^2} \left( \mathbf{E}X_1^4 - 3\right) ^2 + o\left( \frac{1}{n^2\, (\log n)^{3/2}}\right) . \end{aligned}$$

More generally, (1.3) is simplified, when the first \(k-1\) moments of \(X_1\) coincide with the corresponding moments of \(Z \sim N(0,1)\).

Corollary 1.2.

Let \(\mathbf{E}\, |X_1|^s < \infty \) \((s \ge 4)\), and assume \(I(Z_{n_0}) < \infty \), for some \(n_0\). Given \(k = 3,4,\ldots ,s\), assume that \(\gamma _j = 0\) for all \(3 \le j < k\). Then

$$\begin{aligned} I(Z_n||Z) \, = \, \frac{\gamma _k^2}{(k-1)!}\cdot \frac{1}{n^{k-2}} + O\left( \frac{1}{n^{k-1}}\right) + o\left( \frac{1}{n^{(s-2)/2}\, (\log n)^{(s-3)/2}}\right) . \nonumber \\ \end{aligned}$$
(1.6)

This relation is consistent with an observation of Johnson who noticed that, if \(\gamma _k \ne 0,\;I(Z_n||Z)\) cannot be asymptotically better than \(n^{-(k-2)}\) ([15], Lemma 2.12).

Let us note that if \(k < \frac{s}{2}\), the \(O\)-term in (1.6) dominates the \(o\)-term. But when \(k \ge \frac{s}{2}\), it can be removed, and if \(k > \frac{s}{2} + 1\), (1.6) just says that

$$\begin{aligned} I(Z_n||Z) \, = \, o\left( n^{-(s-2)/2} \, (\log n)^{-(s-3)/2}\right) . \end{aligned}$$
(1.7)

As for the remaining values \(s=2,3\), there are no coefficients \(c_j\) in the sum (1.3). In case \(s=2\) Theorem 1.1 reduces to Barron–Johnson’s theorem (1.2), while under a 3-rd moment assumption we only have

$$\begin{aligned} I(Z_n||Z) = o\left( \frac{1}{\sqrt{n}}\right) . \end{aligned}$$

In fact, a similar observation holds for the whole range of reals \(2<s<4\). Here the expansion (1.3) should be replaced by the bound (1.7). Although this bound is worse than (1.5), it cannot be essentially improved. As shown in [11], it may happen that \(\mathbf{E}\, |X_1|^s < \infty \) with \(D(X_1) < \infty \) (and actually with \(I(X_1) < \infty \)), while

$$\begin{aligned} D(Z_n||Z)\, \ge \, \frac{c}{n^{(s-2)/2} \ (\log n)^\eta }, \quad n \ge n_1(X_1), \end{aligned}$$

where the constant \(c>0\) depends on \(s\) and an arbitrary prescribed value \(\eta > s/2\). In view of (1.1), a similar lower bound therefore holds for \(I(Z_n||Z)\), as well.

Another interesting issue connected with the convergence theorem (1.2) and the expansion (1.3) is the characterization of distributions for which these results hold. Indeed, the condition \(I(X_1) < \infty \) corresponding to \(n_0 = 1\) in Theorem 1.1 seems to be way too strong. To this aim, we establish an explicit criterion such that \(I(Z_{n_0}) < \infty \) holds for sufficiently large \(n_0\) in terms of the characteristic function \(f_1(t) = \mathbf{E}\,e^{itX_1}\) of \(X_1\).

Theorem 1.3.

Given independent identically distributed random variables \((X_n)_{n \ge 1}\) with finite second moment, the following assertions are equivalent:

  1. (a)

    For some \(n_0, Z_{n_0}\) has finite Fisher information;

  2. (b)

    For some \(n_1, Z_{n_1}\) has density of bounded total variation;

  3. (c)

    For some \(n_2, Z_{n_2}\) has a continuously differentiable density \(p_{n_2}\) such that

    $$\begin{aligned} \int _{-\infty }^\infty |p_{n_2}^{\prime }(x)|\, dx < \infty ; \end{aligned}$$
  4. (d)

    For some \(\varepsilon >0, |f_1(t)| = O(t^{-\varepsilon })\), as \(t \rightarrow \infty \);

  5. (e)

    For some \(\nu > 0\),

$$\begin{aligned} \int _{-\infty }^\infty |f_1(t)|^\nu \,|t|\, dt < \infty . \end{aligned}$$
(1.8)

Property \((c)\) is a formally strengthened variant of \((b)\), although in general they are not equivalent with \(n_1 = n_2\). (For example, the uniform distribution has density of bounded total variation, but its density is not everywhere differentiable.)

Properties \((a)\)\((c)\) are equivalent to each other without any moment assumption, while \((d)\)\((e)\) are always necessary for the finiteness of \(I(Z_n)\) with large \(n\). These two last conditions show that the range of applicability of Theorem 1.1 is indeed rather wide, since almost all reasonable absolutely continuous distributions satisfy (1.8). The latter should be compared to and viewed as a certain strengthening of the following condition (sometimes called a smoothness condition)

$$\begin{aligned} \int _{-\infty }^\infty |f_1(t)|^\nu \, dt < \infty , \quad \mathrm{for \ some} \ \ \nu > 0. \end{aligned}$$

It is equivalent to the property that, for some \(n,\;Z_{n}\) has a bounded continuous density \(p_{n}\) (cf. e.g. [5]). In this and only in this case, a uniform local limit theorem holds:

$$\begin{aligned} \Delta _n = \sup _x |p_n(x) - \varphi (x)| \rightarrow 0, \quad \mathrm{as} \ \ n \rightarrow \infty . \end{aligned}$$

That this assertion is weaker compared to the convergence in Fisher information distance such as (1.2) can be seen by Shimizu’s inequality \(\Delta _n^2 \le c I(Z_n||Z)\), which holds with some absolute constant \(c\) ([3, 21], Lemma 1.5). Note in this connection that Shimizu’s inequality may be strengthened in terms of the total variation distance as \(\Vert p_n - \varphi \Vert _\mathrm{TV}^2 \le c I(Z_n||Z)\). Using Theorem 1.3, this shows in the i.i.d. case that (1.2) is equivalent to the convergence \(\Vert p_n - \varphi \Vert _\mathrm{TV} \rightarrow 0\).

The paper is organized in the following way. We start with the description of general properties of densities having finite Fisher information (Sect. 2) and properties of Fisher information as a functional on spaces of densities (showing lower semi-continuity and convexity, Sect. 3). Some of the properties and relations which we state for completeness may be known already. We apologize for being unable to find references for them.

In Sects. 4 and 5 we turn to upper bounds needed mainly in the proof of Theorem 1.3. Further properties of densities emerging after several convolutions, as well as, bounds under additional moment assumptions are discussed in Sects. 68. In Sect. 9 we complete the proof of Theorem 1.3, and in the next section we state basic lemmas on Edgeworth-type expansions which are needed in the proof of Theorem 1.1. Sections 11 and 12 are devoted to the proof itself. Some remarks leading to the particular case \(s=2\) in Theorem 1.1 (Barron–Johnson theorem) are given in Sect. 13. Finally, in the last section we briefly describe the modifications needed to obtain Theorem 1.1 under moment assumptions with arbitrary real values of \(s\).

2 General properties of densities with finite Fisher information

Definition

If a random variable \(X\) has an absolutely continuous density \(p\) with Radon–Nikodym derivative \(p^{\prime }\), put

$$\begin{aligned} I(X) = I(p) = \int _{\{p(x)>0\}} \frac{p^{\prime }(x)^2}{p(x)} \, dx. \end{aligned}$$
(2.1)

In this case, if \(\tilde{p}(x) = p(x)\) for almost all \(x\), i.e., if \(\tilde{p}\) is another representative of the density, put \(I(\tilde{p}) = I(p)\). In all other cases, put \(I(X) = \infty \). The quantity \(I(X)\) is called the Fisher information of \(X\).

With this definition, \(I\) is correctly defined as a functional on the space of all densities (and on the space of all probability distributions). However, when \(I(X)<\infty \) and \(p\) is the density of \(X\), we will always assume that \(p\) is chosen to be absolutely continuous. In particular, in this case the derivative \(p^{\prime }(x)\) exists and is finite on a set of full Lebesgue measure.

One may write an equivalent definition by involving the score function \(\rho (x) = \frac{p^{\prime }(x)}{p(x)}\). In general \(\mathbf{P}\{p(X) > 0\} = 1\), so the random variable \(\rho (X)\) is defined with probability 1, and thus

$$\begin{aligned} I(X) = \mathbf{E}\,\rho (X)^2. \end{aligned}$$
(2.2)

For different purposes, it is useful to realize how the ratio \(\frac{p^{\prime }(x)^2}{p(x)}\) may behave when \(p(x)\) is small and is even vanishing. The behavior cannot be arbitrary, when the Fisher information is finite. The following statement will allow us to make more rigorous the derivation of various Fisher information bounds on the density and its derivatives.

Proposition 2.1

Assume \(X\) has density \(p\) with finite Fisher information. If \(p\) is differentiable at the point \(x_0\) such that \(p(x_0) = 0\), then \(p^{\prime }(x_0) = 0\).

Proof

If \(p\) is differentiable in some neighborhood of \(x_0\), and its derivative is continuous at this point, the statement is obvious. In the general case, for simplicity of notations let \(x_0 = 0\), and assume \(c = p^{\prime }(0) > 0\). Since \(p(\varepsilon ) = c\varepsilon + o(\varepsilon )\), as \(\varepsilon \rightarrow 0\), one may choose \(\varepsilon _0>0\) such that

$$\begin{aligned} \frac{3c}{4}\,|x| \le p(x) \le \frac{5c}{4}\,|x|, \quad \mathrm{for \ all} \ \ |x| \le \varepsilon _0. \end{aligned}$$

In particular, \(p\) is positive on \((0,\varepsilon _0]\). Hence, according to (2.1),

$$\begin{aligned} I(X) \ge \int _0^{\varepsilon _0} \frac{p^{\prime }(x)^2}{p(x)}\,dx \ge \frac{4}{5c}\, \int _0^{\varepsilon _0} \frac{p^{\prime }(x)^2}{x}\,dx. \end{aligned}$$

We split the last integral into the intervals \(\Delta _n = (2^{-(n+1)}\varepsilon _0,2^{-n}\varepsilon _0)\), which leads to

$$\begin{aligned} \frac{5c}{4}\,I(X) \, \ge \, \sum _{n=0}^\infty \, \frac{2^n}{\varepsilon _0} \int _{\Delta _n} p^{\prime }(x)^2\,dx. \end{aligned}$$

Now, applying Cauchy’s inequality and using \(p(x) - p(\frac{x}{2}) \ge \frac{c}{8}\,x\) for \(0 \le x \le \varepsilon _0\), we obtain

$$\begin{aligned} \int _{\Delta _n} p^{\prime }(x)^2\,dx&\ge \frac{2^{n+1}}{\varepsilon _0}\, \left( \int _{\Delta _n} p^{\prime }(x)\,dx\right) ^2 \\&= \frac{2^{n+1}}{\varepsilon _0}\, \left( p(2^{-n}\varepsilon _0) - p(2^{-(n+1)}\varepsilon _0)\right) ^2 \, \ge \, 2^{-(n-1)}\,\frac{c^2 \varepsilon _0}{64}. \end{aligned}$$

As a result,

$$\begin{aligned} \frac{5c}{4}\,I(X) \, \ge \, \sum _{n=0}^\infty 2^n \cdot 2^{-(n-1)} \cdot \frac{c^2}{64} = \infty , \end{aligned}$$

a contradiction with finiteness of the Fisher information. \(\square \)

As an example illustrating a possible behavior as in Proposition 2.1, one may consider the beta distribution with parameters \(\alpha = \beta = 3\), which has the density

$$\begin{aligned} p(x) = 30\,(x(1-x))^2, \quad 0 \le x \le 1. \end{aligned}$$

Then \(X\) has finite Fisher information, although \(p(x_0) = p^{\prime }(x_0) = 0\) at \(x_0 = 0\) and \(x_0 = 1\).

More generally, if a density \(p\) is supported and twice differentiable on a finite interval \([a,b]\), and if \(p\) has finitely many zeros \(x_0 \in [a,b]\), and \(p^{\prime }(x_0) = 0\;p^{\prime \prime }(x_0) > 0\) at any such point, then \(X\) has finite Fisher information.

Now, let us return to the definitions (2.1)–(2.2). By Cauchy’s inequality,

$$\begin{aligned} I(X)^{1/2} = \left( \mathbf{E}\,\rho (X)^2\right) ^{1/2} \ge \mathbf{E}\,|\rho (X)| = \int _{\{p(x)>0\}} |p^{\prime }(x)|\,dx. \end{aligned}$$

Here, by Proposition 2.1, the last integral may be extended to the whole real line without any change, and then it represents the total variation of the function \(p\) in the usual sense of the Theory of Functions:

$$\begin{aligned} \Vert p\Vert _{\mathrm{TV}} \, = \, \sup \, \sum _{k=1}^n |p(x_k) - p(x_{k-1})|, \end{aligned}$$

where the supremum runs over all finite collections \(x_0 < x_1 < \cdots < x_n\).

In the sequel, we consider this norm also for densities which are not necessarily continuous, and then it is natural to require that, for each \(x\), the value \(p(x)\) lies in the closed segment \(\Delta (x)\) with endpoints \(p(x-)\) and \(p(x+)\). Note that if we change \(p(x)\) at a point of discontinuity such that \(p(x)\) goes out of \(\Delta (x)\), then the probability measure \(\mu (dx) = p(x)dx\) with density \(p\) is unchanged, while \(\Vert p\Vert _{\mathrm{TV}}\) will increase.

Let us note that the same notation \(\Vert \nu \Vert _{\mathrm{TV}}\) in the sense of the Measure Theory is commonly used to denote the total variation of a signed Borel measure \(\nu \) on the real line. The connection with the Theory of Functions is simply \(\Vert \nu \Vert _{\mathrm{TV}} = \Vert F\Vert _{\mathrm{TV}}\) in terms of the cumulative “distribution function” \(F(x) = \nu ((-\infty ,x])\).

Returning to the Fisher information, we thus observed that, if \(I(X)\) is finite, the density \(p\) of \(X\) is a function of bounded variation. Hence, the limits

$$\begin{aligned} p(-\infty ) = \lim _{x \rightarrow -\infty } p(x), \quad p(\infty ) = \lim _{x \rightarrow \infty } p(x) \end{aligned}$$

exist and are finite. But, since \(p\) is a density (hence integrable), these limits must be zero. In addition, for any \(x\),

$$\begin{aligned} p(x) = \int _{-\infty }^x p^{\prime }(y)\,dy \le \int _{-\infty }^x |p^{\prime }(y)|\,dy \le \sqrt{I(X)}. \end{aligned}$$

We can summarize these observations in the following:

Proposition 2.2

If \(X\) has density \(p\) with finite Fisher information \(I(X)\), then \(p(-\infty ) = p(\infty ) = 0\), and the density has finite total variation satisfying

$$\begin{aligned} \Vert p\Vert _{\mathrm{TV}} = \int _{-\infty }^\infty |p^{\prime }(x)|\,dx \le \sqrt{I(X)}. \end{aligned}$$

In particular, \(p\) is bounded: \(\max _x p(x) \le \sqrt{I(X)}\).

Let \(f(t) = \mathbf{E}\,e^{itX}\) denote the characteristic function of a random variable \(X\) with density \(p\). Since in general \(|f(t)| \le \frac{\Vert p\Vert _{\mathrm{TV}}}{|t|}\), an immediate consequence of Proposition 2.2 is a similar bound

$$\begin{aligned} |f(t)| \le \frac{1}{|t|}\,\sqrt{I(X)}, \end{aligned}$$

involving the Fisher information. Here, as noticed by Zhang [24], the behavior near the origin can be improved by using the Cramér–Rao inequality which yields:

Proposition 2.3

If \(X\) has finite Fisher information, then its characteristic function \(f(t)\) admits the bound

$$\begin{aligned} |f(t)|^2 \le \frac{I(X)}{I(X) + t^2}, \quad t \in \mathbf{R}. \end{aligned}$$
(2.3)

Indeed, for any smooth function \(u:\mathbf{R}\rightarrow \mathbf{C}\) such that \(\mathbf{E}\, |u^{\prime }(X)| < \infty \), one has, by integration by parts and applying Cauchy’s inequality,

$$\begin{aligned} |\mathbf{E}\, u^{\prime }(X)|^2 \le I(X)\, \mathbf{E}\, |u(X)|^2. \end{aligned}$$

In case \(u(x) = e^{itx} - f(t)\), this gives (2.3); cf. also [24] for a slightly different argument.

Another immediate consequence of Proposition 2.2 is that both \(p\) and \(p^{\prime }\) are square integrable, that is, \(p\) belongs to the Sobolev space \(W_1^2 = W_1^2(-\infty ,\infty )\) of all real-valued absolutely continuous functions on the real line with finite (Hilbert) norm

$$\begin{aligned} \Vert u\Vert _{W_1^2}^2 = \int _{-\infty }^\infty u(x)^2\,dx + \int _{-\infty }^\infty u^{\prime }(x)^2\,dx. \end{aligned}$$

More precisely,

$$\begin{aligned} \int _{-\infty }^\infty p^{\prime }(x)^2\,dx \, = \, \int _{-\infty }^\infty \frac{p^{\prime }(x)^2}{p(x)}\,p(x)\,dx \, \le \, \max _x p(x) \int _{-\infty }^\infty \frac{p^{\prime }(x)^2}{p(x)}\,dx \, \le \, I(X)^{3/2}. \nonumber \\ \end{aligned}$$
(2.4)

By the inverse Fourier formula, the resulting inequality in (2.4) is equivalent to the following integral analogue of the pointwise bound (2.3).

Proposition 2.4

The characteristic function \(f(t)\) of any random variable \(X\) satisfies

$$\begin{aligned} \int _{-\infty }^\infty |t f(t)|^2\,dt \, \le \, 2\pi \,I(X)^{3/2}. \end{aligned}$$
(2.5)

In particular, when the Fisher information of \(X\) is finite, so is the integral in (2.5)

Let us return to Proposition 2.2. Since the estimate on the total variation norm \(\Vert p\Vert _{\mathrm{TV}}\) can be given in terms of the Fisher information, it is natural to ask whether or not it is possible to bound the total variation distance from \(p\) to a normal density in terms of the relative Fisher information. This suggests the following bound.

Proposition 2.5

If \(X\) has mean zero, variance one, and a density \(p\) with finite Fisher information, then

$$\begin{aligned} \Vert p - \varphi \Vert _{\mathrm{TV}} \le 4\sqrt{I(X||Z)}, \end{aligned}$$
(2.6)

where \(Z\) has the standard normal density \(\varphi \).

Proof

Using

$$\begin{aligned} p^{\prime }(x) - \varphi ^{\prime }(x) = \left( \frac{p^{\prime }(x)}{p(x)} - \frac{\varphi ^{\prime }(x)}{\varphi (x)}\right) p(x) - x\,(p(x) - \varphi (x)) \quad (p(x)>0) \end{aligned}$$

and applying Cauchy’s inequality, we may write

$$\begin{aligned} \Vert p - \varphi \Vert _{\mathrm{TV}}&= \int _{-\infty }^\infty |p^{\prime }(x) - \varphi ^{\prime }(x)|\,dx \nonumber \\&\le I(X||Z)^{1/2} + \int _{-\infty }^\infty |x|\,|p(x) - \varphi (x)|\,dx. \end{aligned}$$
(2.7)

The last integral represents a weighted total variation distance between the distributions of \(X\) and \(Z\) with weight function \(w(x) = |x|\).

On this step we apply the following extention of Csiszár-Kullback-Pinsker’s inequality (CKP) to the scheme of weighted total variation distances, which is proposed by Bolley and Villani, cf. [12], Theorem 2.1 (ii). If \(X\) and \(Y\) are random variables with densities \(p\) and \(q\), and \(w(x) \ge 0\) is a measurable function, then

$$\begin{aligned} \left( \,\,\int _{-\infty }^\infty w(x)\,|p(x) - q(x)|\,dx\right) ^2 \, \le \, C D(X||Y) \, = \, C \int _{-\infty }^\infty p(x)\,\log \frac{p(x)}{q(x)}\,dx, \end{aligned}$$

where

$$\begin{aligned} C \, = \, 2\, \left( 1 + \log \int _{-\infty }^\infty e^{w(x)^2} q(x)\,dx\right) . \end{aligned}$$

The inequality also holds in the setting of abstract measurable spaces, and when \(w=1\) it yields the classical CKP inequality with an additional factor \(2\).

In our case, \(Y=Z,\;q = \varphi \), and taking \(w(x) = \sqrt{t/2}\, |x|\) \((0 < t < 1\)), we get

$$\begin{aligned} \frac{t}{2}\, \left( \,\,\int _{-\infty }^\infty |x|\,|p(x) - \varphi (x)|\,dx\right) ^2 \, \le \, \left( 2 + \log \frac{1}{1-t}\right) \,D(X||Z). \end{aligned}$$

One may choose, for example, \(t = 1 - \frac{1}{e}\), and recalling (1.1), we arrive at

$$\begin{aligned} \int _{-\infty }^\infty |x|\,|p(x) - \varphi (x)|\,dx \, \le \, 3.1\,D(X||Z)^{1/2} \le \frac{3.1}{\sqrt{2}} \, I(X||Z)^{1/2}. \end{aligned}$$

It remains to use this bound in (2.7), and (2.6) follows. \(\square \)

3 Fisher information as a functional

It is worthwile to discuss separately a few general properties of the Fisher information viewed as a functional on the space of densities. We start with topological properties.

Proposition 3.1

Let \((X_n)_{n \ge 1}\) be a sequence of random variables, and \(X\) be a random variable such that \(X_n \Rightarrow X\) weakly in distribution. Then

$$\begin{aligned} I(X) \le \liminf _{n \rightarrow \infty } \, I(X_n). \end{aligned}$$
(3.1)

Denote by \(\mathfrak{P }_1\) the collection of all (probability) densities on the real line with finite Fisher information, and let \(\mathfrak{P }_1(I)\) denote the subset of all densities which have Fisher information of at most size \(I>0\). On the set \(\mathfrak{P }_1\) the relation (3.1) may be written as

$$\begin{aligned} I(p) \le \liminf _{n \rightarrow \infty } \, I(p_n), \end{aligned}$$
(3.2)

which holds under the condition that the corresponding distributions are convergent weakly, i.e.,

$$\begin{aligned} \lim _{n \rightarrow \infty } \, \int _{-\infty }^a p_n(x)\,dx \, = \, \int _{-\infty }^a p(x)\,dx, \quad \mathrm{for \ all} \ \ a \in \mathbf{R}. \end{aligned}$$
(3.3)

Hence, every \(\mathfrak{P }_1(I)\) is closed in the weak topology. In fact, inside such sets (3.3) can be strengthened to the convergence in the \(L^1\)-metric,

$$\begin{aligned} \lim _{n \rightarrow \infty } \, \int _{-\infty }^\infty |p_n(x) - p(x)|\,dx \, = \, 0. \end{aligned}$$
(3.4)

Proposition 3.2

On every set \(\mathfrak{P }_1(I)\) the weak topology with convergence (3.3) and the topology generated by the \(L^1\)-norm coincide, and the Fisher information is a lower semi-continuous functional on this set.

Proof

For the proof of Proposition 3.1, one may assume that \(I(X_n) \rightarrow I\), for some (finite) constant \(I\). Then, for sufficiently large \(n\), the \(X_n\) have absolutely continuous densities \(p_n\) with Fisher information at most \(I+1\). By Proposition 2.2, such densities are uniformly bounded and have uniformly bounded variations. Hence, by the second Helly theorem (cf. e.g. [16]), there are a subsequence \(p_{n_k}\) and a function \(p\) of bounded variation, such that \(p_{n_k}(x) \rightarrow p(x)\), as \(k \rightarrow \infty \), for all points \(x\). Necessarily, \(p(x) \ge 0\) and \(\int _{-\infty }^\infty p(x)\,dx \le 1\). Since the sequence of distributions of \(X_n\) is tight (or weakly pre-compact), it also follows that \(\int _{-\infty }^\infty p(x)\,dx = 1\). Hence, \(X\) has an absolutely continuous distribution with \(p\) as its density, and the weak convergence (3.3) holds.

For the proof of Proposition 3.2, a similar argument should be applied to an arbitrary prescribed subsequence \(p_{n_k}\), where we obtain \(p(x) = \lim _{l \rightarrow \infty } p_{n_{k_l}}(x)\) for some further subsequence. By Scheffe’s lemma, this property implies the convergence in \(L^1\)-norm, that is, (3.4) holds along \(n_{k_l}\). This implies the convergence in \(L^1\) for the whole sequence \(p_n\), which is the assertion of Proposition 3.2 (the first part).

To continue the proof of Proposition 3.1, for simplicity of notations, assume that the subsequence constructed in the first step is actually the whole sequence. By (2.4),

$$\begin{aligned} \int _{-\infty }^\infty p_n^{\prime }(x)^2\,dx \le (I + 1)^{3/2}, \end{aligned}$$

which implies that the derivatives are uniformly integrable on every finite interval. By the Dunford-Pettis compactness criterion for the space \(L^1\) (over finite measures), there is a subsequence \(p_{n_k}^{\prime }\) which is convergent to some locally integrable function \(u\) in the sense that

$$\begin{aligned} \int _A p_{n_k}^{\prime }(x)\,dx \rightarrow \int _A u(x)\,dx, \end{aligned}$$
(3.5)

for any bounded Borel set \(A \subset \mathbf{R}\). (This is the weak \(\sigma (L^1,L^\infty )\) convergence on finite intervals.) Note that, according to Proposition 2.1, \(p_{n_k}^{\prime }(x)\) may be replaced in (3.5) with the sequence \(p_{n_k}^{\prime }(x)\, 1_{\{p_{n_k}(x) > 0\}}\) which is thus convergent to \(u(x)\) as well.

Taking finite intervals \(A = (a,b)\) in (3.5), we get

$$\begin{aligned} \int _a^b u(x)\,dx = p(b) - p(a), \end{aligned}$$

which means that \(p\) is (locally) absolutely continuous. Furthermore, since

$$\begin{aligned} \Vert p\Vert _{\mathrm{TV}} = \int _{-\infty }^\infty |u(x)|\,dx, \end{aligned}$$

and since \(p\) has finite total variation, we conclude that \(u \in L^1(\mathbf{R})\), thus representing a Radon–Nikodym derivative: \(u(x) = p^{\prime }(x)\). Again, for simplicity of notations, assume the subsequence of derivatives obtained is actually the whole sequence.

Next, consider the sequence of functions

$$\begin{aligned} \xi _n(x) = \frac{p_n^{\prime }(x)}{\sqrt{p_n(x)}}\, 1_{\{p_n(x) > 0\}}. \end{aligned}$$

They have \(L^2(\mathbf{R})\)-norm bounded by \(\sqrt{I+1}\) (for large \(n\)). Since the unit ball of \(L^2\) is weakly compact, there is a subsequence \(\xi _{n_k}\) which is weakly convergent to some function \(\xi \in L^2\), i.e.,

$$\begin{aligned} \int _{-\infty }^\infty \xi _{n_k}(x)\, q(x)\,dx \rightarrow \int _{-\infty }^\infty \xi (x)\, q(x)\,dx, \end{aligned}$$

for any \(q \in L^2\). As a consequence,

$$\begin{aligned} \int _{-\infty }^\infty \xi _{n_k}(x)\, \sqrt{p_{n_k}(x)}\, q(x)\,dx \rightarrow \int _{-\infty }^\infty \xi (x)\, \sqrt{p(x)}\, q(x)\,dx, \end{aligned}$$

due to the uniform boundedness and pointwise convergence of \(p_n\). In other words, again omitting sub-indices, the functions \(p_n^{\prime }\, 1_{\{p_n > 0\}}\) are weakly convergent in \(L^2\) to the function \(\xi \sqrt{p}\). In particular, for \(q = 1_A\) with an arbitrary bounded Borel set \(A \subset \mathbf{R}\),

$$\begin{aligned} \int _A p_n^{\prime }\, 1_{\{p_n > 0\}}\,dx \rightarrow \int _A \xi (x)\sqrt{p(x)}\ dx. \end{aligned}$$

As a result, we have obtained two limits for \(p_n^{\prime }\, 1_{\{p_n > 0\}}\) which must coincide, i.e., we get \(\xi \sqrt{p} = u = p^{\prime }\) a.e. Hence, \(p = 0 \Rightarrow p^{\prime } = 0\) and \(\xi = \frac{p^{\prime }}{\sqrt{p}}\)   a.e. on the set \(\{p(x) > 0\}\). Finally, the weak convergence \(\xi _{n_k} \rightarrow \xi \) in \(L^2\), as in any Banach space, yields

$$\begin{aligned} I(p) \, = \, \Vert \xi \cdot 1_{\{p>0\}}\Vert _{L^2}^2 \, \le \, \Vert \xi \Vert _{L^2}^2 \, \le \, \liminf _{k \rightarrow \infty }\, \Vert \xi _{n_k}\Vert _{L^2}^2 \, = \, \liminf _{n \rightarrow \infty } \, I(p_{n_k}) \, = \, I. \end{aligned}$$

Thus, Proposition 3.1 is proved. \(\square \)

Another general property of the Fisher information is its convexity, that is, we have the inequality

$$\begin{aligned} I(p) \le \sum _{i=1}^n \alpha _i I(p_i), \end{aligned}$$
(3.6)

where \(p = \sum _{i=1}^n \alpha _i p_i\) with arbitrary densities \(p_i\) and weights \(\alpha _i > 0\;\sum _{i=1}^n \alpha _i = 1\). This readily follows from the fact that the homogeneous function \(R(u,v) = u^2/v\) is convex on the upper half-plane \(u \in \mathbf{R}\;v>0\). Moreover, Cohen [14] showed that the inequality (3.6) is strict.

As a consequence, the collection \(\mathfrak{P }_1(I)\) of all densities on the real line with Fisher information \(\le I\) represents a convex closed set in the space \(L^1 = L^1(\mathbf{R})\) (for strong or weak topologies).

We need to extend Jensen’s inequality (3.6) to arbitrary “continuous” convex mixtures of densities. In order to formulate this more precisely, recall the definition of mixtures. Denote by \(\mathfrak{P }\) the collection of all densities which represents a closed subset of \(L^1\) with the weak \(\sigma (L^1,L^\infty )\) topology. For any Borel set \(A \subset \mathbf{R}\), the functionals \(q \rightarrow \int _A q(x)\,dx\) are bounded and continuous on \(\mathfrak{P }\). So, given a Borel probability measure \(\pi \) on \(\mathfrak{P }\), one may introduce the probability measure on the real line

$$\begin{aligned} \mu (A) = \int _\mathfrak{P } \left[ \int _A q(x)\,dx\right] \,d\pi (q). \end{aligned}$$
(3.7)

It is absolutely continuous with respect to Lebesgue measure and has some density \(p(x) = \frac{d\mu (x)}{dx}\) called the (convex) mixture of densities with mixing measure \(\pi \). For short,

$$\begin{aligned} p(x) = \int _\mathfrak{P } q(x)\,d\pi (q). \end{aligned}$$

Proposition 3.3

If \(p\) is a convex mixture of densities with mixing measure \(\pi \), then

$$\begin{aligned} I(p) \le \int _\mathfrak{P } I(q)\,d\pi (q). \end{aligned}$$
(3.8)

Proof

Note that the integral in (3.8) makes sense, since the functional \(q \rightarrow I(q)\) is lower semi-continuous and hence Borel measurable on \(\mathfrak{P }\) (Proposition 3.1). We may assume that this integral is finite, so that \(\pi \) is supported on the convex (Borel measurable) set \(\mathfrak{P }_1 = \cup _I \mathfrak{P }_1(I)\).

Identifying densities with corresponding probability measures (having these densities), we consider \(\mathfrak{P }_1\) as a subset of the locally convex space \(E\) of all finite Borel measures \(\mu \) on the real line endowed with the weak topology.

Step 1. Suppose that the measure \(\pi \) is supported on some convex compact set \(K\) contained in \(\mathfrak{P }_1(I)\). Since the functional \(q \rightarrow I(q)\) is finite, convex and lower semi-continuous on \(K\), it admits the representation

$$\begin{aligned} I(q) \, = \, \sup _{l \in \mathfrak{L }} \, l(q), \quad q \in K, \end{aligned}$$

where \(\mathfrak{L }\) denotes the family of all continuous affine functionals \(l\) on \(E\) such that \(l(q) < I(q)\), for all \(q \in K\) (cf. e.g. Meyer [18], Chapter XI, Theorem T7). In our particular case, any such functional acts on probability measures as \(l(\mu ) = \int _{-\infty }^\infty \psi (x)\,d\mu (x)\) with some bounded continuous function \(\psi \) on the real line. Hence,

$$\begin{aligned} I(q) \, = \, \sup _{\psi \in \mathfrak{C }} \, \int _{-\infty }^\infty q(x)\psi (x)\,dx, \end{aligned}$$

for some family \(\mathfrak{C }\) of bounded continuous functions \(\psi \) on \(\mathbf{R}\). An explicit description of \(\mathfrak{C }\) would be of interest, but this question will not be pursued here. As a consequence, by the definition (3.7) for the measure \(\mu \) with density \(p\),

$$\begin{aligned} \int _\mathfrak{P } I(q)\,d\pi (q)&\ge \sup _{\psi \in \mathfrak{C }} \ \int _\mathfrak{P } \left[ \int _{-\infty }^\infty q(x)\psi (x)\,dx\right] \,d\pi (q) \\&= \sup _{\psi \in \mathfrak{C }} \ \int _{-\infty }^\infty p(x)\psi (x)\,dx \ = \ I(p), \end{aligned}$$

which is the desired inequality (3.8).

Step 2. Suppose that \(\pi \) is supported on \(\mathfrak{P }_1(I)\), for some \(I>0\). Since any finite measure on \(E\) is Radon, and since the set \(\mathfrak{P }_1(I)\) is closed and convex, there is an increasing sequence of compact subsets \(K_n \subset \mathfrak{P }_1(I)\) such that \(\pi (\cup _n K_n) = 1\). Moreover, \(K_n\) can be chosen to be convex (since the closure of the convex hull will be compact, as well). Let \(\pi _n\) denote the normalized restriction of \(\pi \) to \(K_n\) with sufficiently large \(n\) so that \(c_n = \pi (K_n) > 0\), and define its baricenter

$$\begin{aligned} p_n(x) = \int _{K_n} q(x)\,d\pi _n(q). \end{aligned}$$
(3.9)

From (3.7) it follows that the measures with densities \(p_n\) are weakly convergent to the measure \(\mu \) with density \(p\), hence the relation (3.2) holds: \(I(p) \le \liminf _{n \rightarrow \infty } \, I(p_n)\). On the other hand, by the previous step,

$$\begin{aligned} I(p_n) \, \le \, \int _{K_n} I(q)\,d\pi _n(q) = \frac{1}{c_n}\, \int _{K_n} I(q)\,d\pi (q) \, \rightarrow \int _{\mathfrak{P }_1(I)} I(q)\,d\pi (q), \end{aligned}$$
(3.10)

and we obtain (3.8).

Step 3. In the general case, we may apply Step 2 to the normalized restrictions \(\pi _n\) of \(\pi \) to the sets \(K_n = \mathfrak{P }_1(n)\). Again, for the densities \(p_n\) defined as in (3.9), we obtain (3.10), where \(\mathfrak{P }_1(I)\) should be replaced with \(\mathfrak{P }_1\). Another application of the lower semi-continuity of the Fisher information finishes the proof. \(\square \)

4 Convolution of three densities of bounded variation

Although densities with finite Fisher information must be functions of bounded variation, the converse is not always true. Nevertheless, starting from a density of bounded variation and taking several convolutions with itself, the resulting density will have finite Fisher information. Our nearest aim is to prove:

Proposition 4.1

If independent random variables \(X_1,X_2,X_3\) have densities \(p_1,p_2,\) \(p_3\) with finite total variation, then \(S = X_1 + X_2 + X_3\) has finite Fisher information, and moreover,

$$\begin{aligned} I(S) \, \le \, \frac{1}{2}\,\Big [ \Vert p_1\Vert _{\mathrm{TV}} \, \Vert p_2\Vert _{\mathrm{TV}} + \Vert p_1\Vert _{\mathrm{TV}} \, \Vert p_3\Vert _{\mathrm{TV}} + \Vert p_2\Vert _{\mathrm{TV}} \, \Vert p_3\Vert _{\mathrm{TV}}\Big ]. \end{aligned}$$
(4.1)

One may further extend (4.1) to sums of more than 3 independent summands, but this will not be needed for our purposes (since the Fisher information may only decrease when adding an independent summand.)

In the i.i.d. case the above estimate can be simplified. By a direct application of the inverse Fourier formula, the right-hand side of (4.1) may be related furthermore to the characteristic functions of \(X_j\). We will return to this in the next section.

First let us look at the particular case where \(X_j\) are uniformly distributed over intervals. This important example already shows that the Fisher information \(I(X_1 + X_2)\) does not need to be finite, while it is finite for 3 summands. (This somewhat curious fact was pointed out to one of the authors by K. Ball.) In fact, there is a simple quantitative bound.

Lemma 4.2

If independent random variables \(X_1,X_2,X_3\) are uniformly distributed on intervals of lengths \(a_1,a_2,a_3\), then

$$\begin{aligned} I(X_1 + X_2 + X_3) \, \le \, 2\, \left[ \frac{1}{a_1 a_2} + \frac{1}{a_1 a_3} + \frac{1}{a_2 a_3}\right] . \end{aligned}$$
(4.2)

The density of the sum \(S = X_1 + X_2 + X_3\) may easily be evaluated and leads to a rather routine problem of estimation of \(I(S)\) as a function of the parameters \(a_j\). Alternatively, there is an elegant approach based on the Brunn–Minkowski inequality and the fact that the density \(p\) of \(S\) behaves like the beta density near the end points of the supporting interval.

To describe the argument, first let us recall the volume relation (Brunn’s theorem)

$$\begin{aligned} |tA + (1-t)B|^{1/2} \ge t\, |A|^{1/2} + (1-t)\, |B|^{1/2}, \quad 0 < t < 1, \end{aligned}$$
(4.3)

which holds for arbitrary non-empty Borel sets \(A,B\) lying in parallel hyperplanes of the Euclidean space \(\mathbf{R}^3\). Here

$$\begin{aligned} tA + (1-t)B = \{\,ta + (1-t)b:\, a \in A, \ b \in B\} \end{aligned}$$

stands for the Minkowski sum, and \(|C|\) is used to denote the two-dimensional Lebesgue measure of a set \(C\) in the hyperplane where it lies (cf. e.g. [13]). But the random vector \((X_1,X_2,X_3)\) is uniformly distributed in the cube \(Q \subset \mathbf{R}^3\) with sides \(a_j\), so, the density of \(S\) is given by

$$\begin{aligned} p(x) = \frac{1}{a_1 a_2 a_3}\, |\{(x_1,x_2,x_3) \in Q: x_1 + x_2 + x_3 = x\}|. \end{aligned}$$

Hence, by (4.3), the function \(p^{1/2}\) is concave on the supporting interval.

The latter property may also be formulated in terms of the transform

$$\begin{aligned} L(t) = p(F^{-1}(t)), \quad 0 < t < 1. \end{aligned}$$

Here, \(F^{-1}:(0,1) \rightarrow (x_0,x_1)\) denotes the inverse of the distribution function \(F(x) = \mu (x_0,x)\), associated to a given probability measure \(\mu \) which is supported and has a positive continuous density \(p\) on some interval \((x_0,x_1)\), finite or not. Namely, \(p^{1/2}\) is concave on \((x_0,x_1)\), if and only if the function \(L^{3/2}\) is concave on \((0,1)\). Indeed, assuming without loss of generality that \(p\) has a continuous derivative, we have \(L^{\prime }(F(x)) = \frac{p^{\prime }(x)}{p(x)}\) and thus

$$\begin{aligned} \frac{1}{3}\, (L^{3/2})^{\prime }(F(x)) = (p^{1/2})^{\prime }(x), \quad x_0 < x < x_1. \end{aligned}$$

Therefore, the derivative \((p^{1/2})^{\prime }\) does not increase on \((x_0,x_1)\), if and only if \((L^{3/2})^{\prime }\) does not increase on \((0,1)\). We refer to [8] for related issues about the so-called \(\kappa \)-concave probability measures and more general characterizations.

Note also that the Fisher information of a random variable \(X\) with density \(p\) is expressed in terms of the associated function \(L\) as

$$\begin{aligned} I(X) = \int _0^1 L^{\prime }(t)^2\,dt. \end{aligned}$$
(4.4)

This general formula holds whenever \(p\) is absolutely continuous and positive on the supporting interval (without any concavity assumption).

Proof of Lemma 4.2

Let \(X_j\) take values in \([0,a_j]\). As was just explained, the distribution of \(S = X_1 + X_2 + X_3\) has density \(p\) such that \(p^{1/2}\) is concave on the interval \([0,a_1+a_2+a_3]\), or equivalently, \(L^{3/2}\) is concave on \((0,1)\), where \(L\) is the associated function for \(S\).

Note that \(S\) has an absolutely continuous density \(p\), which is thus vanishing at the end points \(x = 0\) and \(x = a_1 + a_2 + a_3\). Hence, \(L(0+) = L(1-) = 0\). By the concavity, there is a non-increasing Radon–Nikodym derivative \((L^{3/2})^{\prime } = \frac{3}{2}\,L^{1/2}\, L^{\prime }\). Since also \(L\) is symmetric about the point \(\frac{1}{2}\), we get, for all \(0 < t < 1\),

$$\begin{aligned} L^{\prime }(t)^2\, L(t) \, \le \, c, \quad \mathrm{where} \ \ \ \ c \, = \, \lim _{t \rightarrow 0}\, L^{\prime }(t)^2\, L(t). \end{aligned}$$

Hence, by (4.4),

$$\begin{aligned} I(X) \, \le \, \int _0^1 \frac{c}{L(t)}\,dt \, = \, c\,(a_1 + a_2 + a_3). \end{aligned}$$
(4.5)

It remains to find the constant \(c\). Putting \(a = a_1 a_2 a_3\), it should be clear that, for all \(x > 0\) and \(t>0\) small enough,

$$\begin{aligned} F(x)\!=\! \mathbf{P}\{S \le x\}\!=\! \frac{x^3}{6a}, \quad p(x)\! =\! \frac{x^2}{2a}, \quad F^{-1}(t) \!=\! (6at)^{1/3}, \quad L(t) \!=\! \frac{1}{2a}\,(6at)^{2/3}, \end{aligned}$$

and finally \(L^{\prime }(t)^2\, L(t) = \frac{2}{a}\). Hence, \(c = \frac{2}{a}\). Thus, in (4.5) we arrive at \(I(X) \le \frac{2}{a}\,(a_1 + a_2 + a_3)\) which is exactly (4.2). \(\square \)

Lemma 4.2 allows us to reduce Proposition 4.1 to the case of uniform distrubutions. Note that if a density \(p\) is written as a convex mixture

$$\begin{aligned} p(x) = \int _\mathfrak{P } q(x)\,d\pi (q), \end{aligned}$$
(4.6)

then by the convexity of the total variation norm,

$$\begin{aligned} \Vert p\Vert _\mathrm{TV} \le \int _\mathfrak{P } \Vert q\Vert _\mathrm{TV}\,d\pi (q). \end{aligned}$$
(4.7)

Recall that we understand (4.6) as the equality (3.7) of the corresponding measures. So, (4.7) uses our original agreement that, for each \(x\), the value \(p(x)\) lies in the closed segment with endpoints \(p(x-)\) and \(p(x+)\).

In order to apply Lemma 4.2 together with Jensen’s inequality for Fisher information, we need however to require that \(\pi \) has to be supported on uniform densities (that is, densities of the normalized Lebesgue measures on finite intervals) and secondly to reverse (4.7). Indeed this turns out to be possible, which may be a rather interesting observation.

Lemma 4.3

Any density \(p\) of bounded variation can be represented as a convex mixture (4.6) of uniform densities with a mixing measure \(\pi \) such that

$$\begin{aligned} \Vert p\Vert _\mathrm{TV} = \int _\mathfrak{P } \Vert q\Vert _\mathrm{TV}\,d\pi (q). \end{aligned}$$
(4.8)

For example, if \(p\) is supported and non-increasing on \((0,+\infty )\), there is a canonical representation

$$\begin{aligned} p(x) = \int _0^\infty \frac{1}{x_1}\,1_{\{0 < x < x_1\}}\,d\pi (x_1) \quad \mathrm{a.e.} \end{aligned}$$

with a unique mixing probability measure \(\pi \) on \((0,\infty )\). In this case \(\Vert p\Vert _\mathrm{TV} = 2p(0+)\), and (4.8) is obvious. One may write a similar representation for densities of unimodal distributions. In general, another way to write (4.6) and (4.8) is

$$\begin{aligned} p(x)&= \int _{x_1>x_0} \frac{1}{x_1 - x_0}\,1_{\{x_0 < x < x_1\}}\, d\pi (x_0,x_1), \\ \Vert p\Vert _\mathrm{TV}&= 2 \int _{x_1>x_0} \frac{1}{x_1 - x_0}\ d\pi (x_0,x_1), \end{aligned}$$

where \(\pi \) is a Borel probability measure on the half-plane \(x_1 > x_0\) (i.e., above the main diagonal). It was noticed by Maurey [17] that a mixing measure \(\pi \) satisfying (4.6) and (4.8) is not unique in general. This can be seen on the example of \(p(x) = \frac{1}{4}\cdot 1_{\{0<x<3\}} + \frac{1}{4}\cdot 1_{\{1<x<2\}}\).

Let us also note that the sets \(\mathrm{BV}(c)\) of all densities \(p\) with \(\Vert p\Vert _\mathrm{TV} \le c\) are closed under the weak convergence (3.3) of the corresponding probability distributions. Moreover, the weak convergence in \(\mathrm{BV}(c)\) coincides with convergence in \(L^1\)-norm, which can be proved using the same arguments as in the proof of Proposition 3.2. In particular, the functional \(q \rightarrow \Vert q\Vert _\mathrm{TV}\) is lower semi-continuous and hence Borel measurable on \(\mathfrak{P }\), so the integrals (4.7)–(4.8) make sense.

Denote by \(U\) the collection of all uniform densities which thus may be identified with the half-plane \(\tilde{U} = \{(a,b) \in \mathbf{R}^2: b > a\}\) via the map \((a,b) \rightarrow q_{a,b}(x) = \frac{1}{b-a}\,1_{\{a<x<b\}}\). The usual convergence on \(\tilde{U}\) in the Euclidean metric coincides with the weak convergence (3.3) of \(q_{a,b}\). The closure of \(U\) for the weak topology contains \(U\) and all delta-measures, hence \(U\) is a Borel measurable subset of \(\mathfrak{P }\).

Proof

We split the argument into two steps.

Step 1. First consider the discrete case, where \(p\) is piecewise constant. That is, assume that \(p\) is supported and constant on consecutive semiopen intervals \(\Delta _k = [x_{k-1},x_k)\;k = 1,\ldots , n\), where \(x_0 < \cdots < x_n\). Putting \(p(x) = c_k\) on \(\Delta _k\), we then have

$$\begin{aligned} \Vert p\Vert _\mathrm{TV} = c_1 + |c_2 - c_1| + \cdots + |c_n - c_{n-1}| + c_n. \end{aligned}$$

In this case the existence of the representation (4.6), moreover—with a discrete mixing measure \(\pi \), satisfying (4.8), can be proved by induction on \(n\).

If \(n = 1\), there is nothing to prove. For \(n = 2\), if \(c_1 = c_2\) or \(\min (c_1,c_2) = 0\), we are reduced to the case \(n=1\). Otherwise, let for definiteness \(c_2 > c_1 > 0\). Then one can write

$$\begin{aligned} p = c_1\, 1_{[x_0,x_2)} + (c_2-c_1)\, 1_{[x_1,x_2)} = \alpha _1\, q_1 + \alpha _2\, q_2, \end{aligned}$$

where \(q_1\) is the uniform density on \(\Delta _1 \cup \Delta _2\) and \(q_2\) is the uniform density on \(\Delta _2\) (with certain \(\alpha _1,\alpha _2>0,\;\alpha _1 + \alpha _2 = 1\)). This representation corresponds to (4.6) with \(\pi \) having the atoms at \(q_1\) and \(q_2\). In addition,

$$\begin{aligned} \alpha _1\, \Vert q_1\Vert _\mathrm{TV} + \alpha _2\, \Vert q_2\Vert _\mathrm{TV} = \Vert c_1\, 1_{[x_0,x_2)}\Vert _\mathrm{TV} + \Vert (c_2-c_1)\, 1_{[x_1,x_2)}\Vert _\mathrm{TV} = 2c_2 = \Vert p\Vert _\mathrm{TV}, \end{aligned}$$

so (4.8) is fulfilled.

If \(n \ge 3\), first we distinguish between several cases. If \(c_1 = 0\) or \(c_n = 0\), we are reduced to the smaller number of supporting intervals. If \(c_k = 0\) for some \(1 < k < n\), one can write \(p = f + g\) with \(f(x) = p(x)\,1_{\{x<x_{k-1}\}}\;g(x) = p(x)\,1_{\{x \ge x_k\}}\). These functions are supported on disjoint half-axes, so \(\Vert p\Vert _\mathrm{TV} = \Vert f\Vert _\mathrm{TV} + \Vert g\Vert _\mathrm{TV}\). Moreover, the induction hypothesis may be applied to both \(f\) and \(g\) (or one can first normalize these functions to work with densities, but this is less convenient). As a result,

$$\begin{aligned} f = f_1 + \cdots + f_k, \quad g = g_1 + \cdots + g_l \quad \mathrm{a.e.} \end{aligned}$$

where each \(f_i\) is supported and constant on some interval inside \([x_0,x_{k-1})\), each \(g_j\) is supported and constant on some interval inside \([x_k,x_n)\), and

$$\begin{aligned} \Vert f\Vert _\mathrm{TV} = \Vert f_1\Vert _\mathrm{TV} + \cdots + \Vert f_k\Vert _\mathrm{TV}, \quad \Vert g\Vert _\mathrm{TV} = \Vert g_1\Vert _\mathrm{TV} + \cdots + \Vert g_l\Vert _\mathrm{TV}. \end{aligned}$$

Hence,

$$\begin{aligned} p = \sum _i f_i + \sum _j g_j \quad \mathrm{with} \quad \Vert f\Vert _\mathrm{TV} = \sum _i \Vert f_i\Vert _\mathrm{TV} + \sum _j \Vert g_j\Vert _\mathrm{TV}. \end{aligned}$$

Finally, assume that \(c_k > 0\) for all \(k \le n\). Putting \(c_* = \min _k c_k\), write \(p = f + g\), where \(f = c_*\, 1_{[x_0,x_n)}\) and \(g\) thus takes the values \(c_k - c_*\) on \(\Delta _k\). Clearly,

$$\begin{aligned} \Vert p\Vert _\mathrm{TV} = 2c_* + \Vert g\Vert _\mathrm{TV} = \Vert f\Vert _\mathrm{TV} + \Vert g\Vert _\mathrm{TV}. \end{aligned}$$

By the definition, \(g\) takes the value zero on one of the intervals (where \(c_k = c_*\)), so we are reduced to the previous step. On that step, we obtained a representation \(g = g_1 + \cdots + g_l\) such that \(\Vert g\Vert _\mathrm{TV} = \Vert g_1\Vert _\mathrm{TV} + \cdots + \Vert g_l\Vert _\mathrm{TV}\), where each \(g_j\) is supported and constant on some interval inside \([x_0,x_n)\). Hence,

$$\begin{aligned} p = f + \sum _j g_j \quad \mathrm{with} \quad \Vert p\Vert _\mathrm{TV} = \Vert f\Vert _\mathrm{TV} + \sum _j \Vert g_j\Vert _\mathrm{TV}. \end{aligned}$$

Although the measure \(\pi \) has not been constructed constructively, one may notice that it should be supported on the densities of the form

$$\begin{aligned} q_{ij}(x) = \frac{1}{x_j - x_i}\,1_{\{x_i \le x < x_j\}}, \quad 0 \le i < j \le n. \end{aligned}$$

Step 2. (Approximation) In the general case, one may assume that \(p\) is right-continuous. Consider the collection of piecewise constant densities of the form

$$\begin{aligned} \tilde{p}(x) = dq(x), \quad q(x) = \sum _{k=1}^N c_k \,1_{\{x_{k-1} \le x < x_k\}}, \quad c_k \ = \min _{x_{k-1} \le x \le x_k} p(x), \end{aligned}$$
(4.9)

with arbitrary points \(x_0 < \cdots < x_N\) of continuity of \(p\), such that \(p\) is not vanishing on \((x_0,x_N)\), and where \(d \ge 1\) is a normalizing constant so that \(\int _{-\infty }^\infty \tilde{p}(x)\,dx = 1\). Denoting by \(y_k\) a point of minimum of \(p\) on \([x_{k-1},x_k]\), we first note that

$$\begin{aligned} \frac{1}{d}\,\Vert \tilde{p}\Vert _\mathrm{TV} \, = \, \Vert q\Vert _\mathrm{TV} \, = \, p(y_1) + p(y_N) + \sum _{k=2}^{N} |p(y_k) - p(y_{k-1})| \, \le \, \Vert p\Vert _\mathrm{TV}. \end{aligned}$$

If the endpoints \(x_0\) and \(x_N\) are fixed, while the maximal step of partition \(\max _k\, (x_k - x_{k-1})\) is getting small, the integral \(\int _{x_0}^{x_N} q(x)\,dx\) will approximate \(\int _{x_0}^{x_N} p(x)\,dx\) (since \(p\) has bounded total variation). Hence, it is possible to construct a sequence \(p_n(x) = d_n q_n(x)\) of the form (4.9) which is convergent to \(p\) in \(L^1\)-norm and with \(d_n \rightarrow 1\). By the construction,

$$\begin{aligned} p_n(x) \le d_n p(x) \quad \mathrm{and} \quad \Vert p_n\Vert _\mathrm{TV} \le d_n\,\Vert p\Vert _\mathrm{TV}. \end{aligned}$$
(4.10)

Now, using the previous step, one can define discrete probability measures \(\pi _n\) supported on \(U\) and such that

$$\begin{aligned} p_n(x) = \int _U q(x)\,d\pi _n(q), \quad \Vert p_n\Vert _\mathrm{TV} = \int _U \Vert q\Vert _\mathrm{TV}\,d\pi _n(q). \end{aligned}$$
(4.11)

Since \(U\) has been identified with the half-plane \(\tilde{U}\), replacing \(d\pi _n(q)\) with \(d\pi _n(a,b)\) should not lead to confusion. In particular, the second equality in (4.11) may be written as

$$\begin{aligned} \Vert p_n\Vert _\mathrm{TV} = 2\int _{\tilde{U}} \frac{1}{b-a}\,d\pi _n(a,b). \end{aligned}$$
(4.12)

Let \(n\) be large enough, say \(n \ge n_0\) (when \(d_n \le 2\)). From the first equality in (4.11) and by (4.10), it then follows that, for any \(T>0\),

$$\begin{aligned} \int _U \left[ \int _{|x| \ge T} q(x)\,dx\right] \,d\pi _n(q) \, = \, \int _{|x| \ge T} p_n(x) \, \le \, 2\int _{|x| \ge T} p(x)\,dx. \end{aligned}$$

Hence, by Chebyshev’s inequality, the sets \(U(\varepsilon ,T) = \{q \in U: \int _{|x| \ge T}\, q(x)\,dx > \varepsilon \}\) have \(\pi _n\)-measure

$$\begin{aligned} \pi _n(U(\varepsilon ,T)) \le \frac{2}{\varepsilon } \int _{|x| \ge T} p(x)\,dx \quad (\varepsilon , T > 0). \end{aligned}$$
(4.13)

Next we choose two sequences \(\varepsilon = \varepsilon _k \downarrow 0\) and \(T = T_k \uparrow \infty \), for which the right-hand side of (4.13), say \(\delta _k\), will tend to zero sufficiently fast, as \(k \rightarrow \infty \). Let \(\delta _k < 2^{-k}\). Identifying \(q\) with corresponding probability distributions, by the Prokhorov compactness criterion (cf. e.g. [6]), the collection of densities

$$\begin{aligned} F_k = \bigcap _{l = k}^\infty \left\{ q \in \mathfrak{P }: \int _{|x| \ge T_l} q(x)\,dx \le \varepsilon _l\right\} \end{aligned}$$

is pre-compact in the space \(M(\mathbf{R})\) of all probability distributions on the real line with the weak topology. Moreover, by (4.13),

$$\begin{aligned} 1 - \pi _n(F_k) \le \sum _{l=k}^\infty \pi _n(U(\varepsilon _l,T_l)) \le \sum _{l=k}^\infty \delta _l < 2^{-(k-1)}. \end{aligned}$$

Therefore, by the same criterion, but now applied to the Polish space \(M(M(\mathbf{R}))\) of all probability distributions on \(M(\mathbf{R})\) (with the weak topology), \(\pi _n\) contains a weakly convergent subsequence \(\pi _{n_k}\) with some limit \(\pi \). This measure is supported on the weak closure of \(U\), which is a larger set, since it contains delta-measures, or the main diagonal in \(\mathbf{R}^2\), if we identify \(U\) with \(\tilde{U}\). However, using (4.12) together with Chebyshev’s inequality, and then applying (4.10), we see that, for any \(\varepsilon > 0\) and all \(n \ge n_0\),

$$\begin{aligned} \pi _n\{(a,b): b - a < \varepsilon \} \, = \, \pi _n\left\{ (a,b): \frac{1}{b - a} > \frac{1}{\varepsilon }\right\} \, \le \, \frac{\varepsilon }{2}\, \Vert p_n\Vert _\mathrm{TV} \, \le \, \varepsilon \,\Vert p\Vert _\mathrm{TV}. \end{aligned}$$

Since \(\varepsilon >0\) is arbitrary, we conclude that \(\pi \) is actually supported on \(U\).

Moreover, taking the limit along \(n_k\) in the first equality in (4.11), we obtain the representation (4.6). Indeed, (4.11) implies that, for all \(a<b\),

$$\begin{aligned} \int _a^b p_n(x)\,dx = \int _U \left[ \int _a^b q(x)\,dx\right] \,d\pi _n(q). \end{aligned}$$

The functional \(q \rightarrow \int _a^b q(x)\,dx\) is bounded and continuous on the space \(\mathfrak{P }\) with the weak topology (3.3), so the limit yields a similar equality

$$\begin{aligned} \int _a^b p(x)\,dx = \int _U \left[ \int _a^b q(x)\,dx\right] \,d\pi (q). \end{aligned}$$

But the latter is equivalent to (4.6).

Finally, the sets \(G(t) = \{q \in U:\Vert q\Vert _\mathrm{TV} > t\}\) are open in the weak topology (by the lower semicontinuity of the total variation norm), hence, \(\liminf _{k \rightarrow \infty } \pi _{n_k}(G(t)) \ge \pi (G(t))\). Applying Fatou’s lemma and then again (4.10) and the second equality in (4.11), we get

$$\begin{aligned} \int _U \Vert q\Vert _\mathrm{TV}\,d\pi (q)&= \int _0^\infty \pi (G(t))\,dt \ \le \ \liminf _{k \rightarrow \infty }\, \int _0^\infty \pi _{n_k}(G(t))\,dt \\&= \liminf _{k \rightarrow \infty }\, \int _U \Vert q\Vert _\mathrm{TV}\,d\pi _{n_k}(q)= \liminf _{k \rightarrow \infty }\, \Vert p_{n_k}\Vert _\mathrm{TV} \ \le \ \Vert p\Vert _\mathrm{TV}. \end{aligned}$$

In view of Jensen’s inequality (4.7), we obtain (4.8) thus proving the lemma. \(\square \)

Proof of Proposition 4.1

We may write down the representation (4.6) from Lemma 4.2 for each of the densities \(p_j\) \((j=1,2,3)\). That is,

$$\begin{aligned} p_j(x) = \int q(x)\,d\pi _j(q) \quad \mathrm{a.e.} \end{aligned}$$

with some mixing probability measures \(\pi _j\), supported on \(U\) and satisfying

$$\begin{aligned} \Vert p_j\Vert _\mathrm{TV} = \int \Vert q\Vert _\mathrm{TV}\,d\pi _j(q). \end{aligned}$$
(4.14)

Taking the convolution, we have a similar representation

$$\begin{aligned} (p_1 * p_2 * p_3)(x) \, = \, \int \!\!\!\int \!\!\!\int (q_1 * q_1 * q_3)(x)\ d\pi _1(q_1) d\pi _2(q_2) d\pi _3(q_3) \quad \mathrm{a.e.} \end{aligned}$$

One can now use Jensen’s inequality (3.8) for the Fisher information and apply (4.2) to bound \(I(p_1 * p_2 * p_3)\) from above by

$$\begin{aligned} \frac{1}{2} \int \!\!\!\int \!\!\!\int \big [\, \Vert q_1\Vert _{\mathrm{TV}} \, \Vert q_2\Vert _{\mathrm{TV}} + \Vert q_1\Vert _{\mathrm{TV}} \, \Vert q_3\Vert _{\mathrm{TV}} + \Vert q_2\Vert _{\mathrm{TV}} \, \Vert q_3\Vert _{\mathrm{TV}}\big ] \ d\pi _1(q_1) d\pi _2(q_2) d\pi _3(q_3). \end{aligned}$$

In view of (4.14), the triple integral coincides with the right-hand side of (4.1). \(\square \)

5 Bounds in terms of characteristic functions

In view of Proposition 4.1, let us describe how to bound the total variation norm of a given density \(p\) of a random variable \(X\) in terms of the characteristic function

$$\begin{aligned} f(t) = \mathbf{E}\, e^{itX} = \int _{-\infty }^\infty e^{itx} p(x)\,dx. \end{aligned}$$

There are many different bounds depending on the integrability properties of \(f\) and its derivatives, which may also depend on assumptions on the finiteness of moments of \(X\). We shall present two of them here.

Recall that, if \(p\) is absolutely continuous, then

$$\begin{aligned} \Vert p\Vert _{\mathrm{TV}} \, = \, \int _{-\infty }^\infty |p^{\prime }(x)|\,dx. \end{aligned}$$

Proposition 5.1

If \(X\) has finite second moment and

$$\begin{aligned} \int _{-\infty }^\infty |t|\,\left( |f(t)| + |f^{\prime }(t)| + |f^{\prime \prime }(t)|\right) \,dt < \infty , \end{aligned}$$
(5.1)

then \(X\) has a continuously differentiable density \(p\) with finite total variation

$$\begin{aligned} \Vert p\Vert _{\mathrm{TV}} \, \le \, \frac{1}{2}\, \int _{-\infty }^\infty \left( |tf^{\prime \prime }(t)| + 2\,|f^{\prime }(t)| + |t f(t)|\right) \,dt. \end{aligned}$$
(5.2)

Proof

The argument is standard, and we recall it here for completeness.

First, by the moment assumption, \(f\) is twice continuously differentiable. Using the inverse Fourier transform, the assumption (5.1) implies that \(X\) has a continuously differentiable density

$$\begin{aligned} p(x) = \frac{1}{2\pi }\, \int _{-\infty }^\infty e^{-itx} f(t)\,dt \end{aligned}$$
(5.3)

with derivative

$$\begin{aligned} p^{\prime }(x) = -\frac{i}{2\pi }\, \int _{-\infty }^\infty e^{-itx}\, tf(t)\,dt. \end{aligned}$$
(5.4)

By the Riemann–Lebesgue theorem, \(f(t) \rightarrow 0\), as \(|t| \rightarrow \infty \), and the same is true for the derivatives \(f^{\prime }(t)\) and \(f^{\prime \prime }(t)\) (since they are Fourier transforms of integrable functions). Therefore, one may integrate in (5.3) by parts to get, for all \(x \in \mathbf{R}\),

$$\begin{aligned} x p(x) = - \frac{i}{2\pi }\, \int _{-\infty }^\infty e^{-itx} f^{\prime }(t)\,dt \end{aligned}$$
(5.5)

and

$$\begin{aligned} x^2 p(x) = -\frac{1}{2\pi }\, \int _{-\infty }^\infty e^{-itx} f^{\prime \prime }(t)\,dt. \end{aligned}$$

By (5.1), we are allowed to differentiate the last equality by performing differentiation under the integral sign, which together with (5.4) and (5.5) gives

$$\begin{aligned} (1+x^2) p^{\prime }(x) \, = \, \frac{i}{2\pi }\, \int _{-\infty }^\infty e^{-itx}\, \left( tf^{\prime \prime }(t) + 2f^{\prime }(t) - tf(t)\right) \,dt. \end{aligned}$$

Hence, \(|p^{\prime }(x)| \le \frac{C}{2\pi \,(1 + x^2)}\) with a constant described as the integral in (5.2). After integration of this pointwise bound, the proposition follows. \(\square \)

One can get rid of the assumption of existing second derivative in the bound above and remove any moment assumption in Proposition 5.1. But we still need to insist on some integrability and differentiability properties for the characteristic function on the positive half-axis.

Proposition 5.2

Assume that the characteristic function \(f(t)\) of a random variable \(X\) has a continuous derivative for \(t>0\) with

$$\begin{aligned} \int _{-\infty }^\infty t^2\,\left( |f(t)|^2 + |f^{\prime }(t)|^2\right) \,dt < \infty . \end{aligned}$$
(5.6)

Then \(X\) has an absolutely continuous density \(p\) with finite total variation

$$\begin{aligned} \Vert p\Vert _{\mathrm{TV}} \, \le \, \left( \,\,\int _{-\infty }^\infty |t f(t)|^2\,dt \int _{-\infty }^\infty |(tf(t))^{\prime }|^2\,dt\right) ^{1/4}. \end{aligned}$$
(5.7)

Proof

First assume additionally that \(f\) decays at infinity sufficiently fast. Then \(tf(t)\) is integrable, so that \(X\) has a smooth density \(p\) with derivative \(p^{\prime }\) represented by (5.4). One may integrate therein by parts over the intervals \((-T,-\varepsilon )\) and \((\varepsilon ,T)\) with \(\varepsilon \downarrow 0, T \uparrow \infty \), using the property that \((tf(t))^{\prime }\) is integrable near zero. Then we get in the limit a similar representation

$$\begin{aligned} x p^{\prime }(x) = -\frac{1}{2\pi } \int _{-\infty }^\infty e^{-itx}\, (t f(t))^{\prime }\,dt, \end{aligned}$$

where the integral is understood in the improper sense (at infinity), and with resulting function in \(L^2(-\infty ,\infty )\). Write \(|p^{\prime }(x)| = \frac{1}{|1 + ix|}\,|(1 + ix)\, p^{\prime }(x)|\) and use Cauchy’s inequality together with Plancherel’s formula, to get

$$\begin{aligned} \left( \,\,\int _{-\infty }^\infty |p^{\prime }(x)|\,dx\right) ^2&\le \int _{-\infty }^\infty \frac{dx}{1+x^2} \ \int _{-\infty }^\infty (1+x^2)\,p^{\prime }(x)^2\,dx \\&= \frac{1}{2}\,\int _{-\infty }^\infty \big [|t f(t)|^2 + |(tf(t))^{\prime }|^2 \big ]\,dt. \end{aligned}$$

Applying the same inequality to \(\lambda X\) and optimizing over \(\lambda > 0\), we arrive at (5.7).

In the general case, one may apply (5.7) to the regularized random variables \(X_\sigma = X + \sigma Z\) with small parameter \(\sigma >0\), where \(Z \sim N(0,1)\) is independent of \(X\). They have smooth densities \(p_\sigma \) and characteristic functions \(f_\sigma (t) = f(t)\, e^{-\sigma ^2 t^2/2}\). Repeating the previous argument for the difference of densities, we obtain an analogue of (5.7),

$$\begin{aligned} \Vert p_{\sigma _1}\!-\! p_{\sigma _2}\Vert _{\mathrm{TV}}^4 \!\le \! \int _{-\infty }^\infty |t\, (f_{\sigma _1}(t) \!-\! f_{\sigma _2}(t))|^2\,dt \int _{-\infty }^\infty |(t\, (f_{\sigma _1}(t) \!-\! f_{\sigma _2}(t)) )^{\prime }|^2\,dt \qquad \end{aligned}$$
(5.8)

with arbitrary \(\sigma _1,\sigma _2 > 0\). Since the integrals in (5.7) are finite, by the Lebesgue dominated convergence theorem, the right-hand side of (5.8) tends to zero, as \(\sigma _1,\sigma _2 \rightarrow 0\). Hence, the family \(\{p_\sigma \}\) is fundamental (Cauchy) for \(\sigma \rightarrow 0\) in the Banach space of all functions of bounded variation on the real line that are vanishing at infinity. As a result, there exists the limit \(p = \lim _{\sigma \rightarrow 0} p_\sigma \) in this space in total variation norm.

Necessarily, \(p(x) \ge 0\), for all \(x\), and \(\int _{-\infty }^\infty p(x)\,dx = 1\). Hence, \(X\) has an absolutely continuous distribution with density \(p\). In addition, by (5.7) applied to \(p_\sigma \),

$$\begin{aligned} \Vert p\Vert _{\mathrm{TV}} \, = \, \lim _{\sigma \rightarrow 0} \, \Vert p_\sigma \Vert _{\mathrm{TV}} \, \le \, \lim _{\sigma \rightarrow 0} \, \left( \,\,\int _{-\infty }^\infty |t f_\sigma (t)|^2\,dt \int _{-\infty }^\infty |(tf_\sigma (t))^{\prime }|^2\,dt\right) ^{1/4}. \end{aligned}$$

The last limit exists and coincides with the right-hand side of (5.7).

Finally, using Plancherel’s formula in (5.4) for the regularized random variables, we have, for all \(\sigma _1,\sigma _2 > 0\),

$$\begin{aligned} \int _{-\infty }^\infty |p_{\sigma _1}^{\prime }(x) - p_{\sigma _2}^{\prime }(x)|^2\,dx = \frac{1}{2\pi }\,\int _{-\infty }^\infty t^2\, |f_{\sigma _1}(t) - f_{\sigma _2}(t)|^2\,dt. \end{aligned}$$

This relation shows that \(\{p_\sigma \}\) is a fundamental family also in the Sobolev space \(W_1^2(-\infty ,\infty )\), and necessarily \(p = \lim _{\sigma \rightarrow 0} p_\sigma \) in the norm of \(W_1^2\). Thus, \(p\) belongs to \(W_1^2\) and is therefore absolutely continuous. \(\square \)

Combining Proposition 4.1 with Propositions 5.1–5.2, one can bound the Fisher information of the sum of three independent random variables in terms of their characteristic functions. In particular, in the i.i.d. case, we have:

Corollary 5.3

If the independent random variables \(X_1,X_2,X_3\) have finite first absolute moment and a common characteristic function \(f(t)\), then

$$\begin{aligned} I(X_1 + X_2 + X_3) \, \le \, \frac{3}{2}\, \left( \,\,\int _{-\infty }^\infty |t f(t)|^2\,dt \int _{-\infty }^\infty |(tf(t))^{\prime }|^2\,dt\right) ^{1/2}. \end{aligned}$$
(5.9)

If \(X_1\) has finite second moment, we also have

$$\begin{aligned} I(X_1 + X_2 + X_3) \, \le \, \frac{3}{8}\,\left( \,\,\int _{-\infty }^\infty \left( |tf^{\prime \prime }(t)| + 2\,|f^{\prime }(t)| + |t f(t)|\right) \,dt\right) ^2. \end{aligned}$$

It is interesting to note that, in turn, the first integral in (5.9) is bounded from above by \(I(X_1)^{3/2}\) up to a constant (Proposition 2.4). The same can also be shown for the second integral under the 4th moment assumption (cf. Sect. 7).

6 Classes of densities representable as convolutions

General bounds like those in Proposition 2.2 may considerably be sharpened in the case where \(p\) is representable as convolution of several densities with finite Fisher information.

Definition 6.1

Given an integer \(k \ge 1\) and a real number \(I > 0\), denote by \(\mathfrak{P }_k(I)\) the collection of all functions on the real line which can be represented as convolution of \(k\) probability densities with Fisher information at most \(I\). Correspondingly, let

$$\begin{aligned} \mathfrak{P }_k = \cup _{I>0}\, \mathfrak{P }_k(I) \end{aligned}$$

denote the collection of all functions representable as convolution of \(k\) probability densities with finite Fisher information.

The collection \(\mathfrak{P }_1\) of all densities with finite Fisher information has been already discussed in connection with general properties of the functional \(I\). For growing \(k\), the classes \(\mathfrak{P }_k(I)\) decrease, since the Fisher information may only decrease when adding an independent summand. This also follows from the following general inequality of Stam

$$\begin{aligned} \frac{1}{I(X_1 + \cdots + X_k)} \, \ge \, \frac{1}{I(X_1)} + \cdots + \frac{1}{I(X_k)}, \end{aligned}$$
(6.1)

which holds for all independent random variables \(X_1,\ldots ,X_k\) (cf. [7, 15, 22]). Moreover, it implies that \(p = p_1 * \cdots * p_k \in \mathfrak{P }_1(I/k)\), as long as \(p_i \in \mathfrak{P }_1(I),\; i = 1,\ldots ,k\).

Any function \(p\) in \(\mathfrak{P }_k\) is \(k-1\) times differentiable, and its \((k-1)\)-th derivative is absolutely continuous and has a Radon–Nikodym derivative, which we denot by \(p^{(k)}\). Let us illustrate this property in the important case \(k=2\). Write

$$\begin{aligned} p(x) = \int _{-\infty }^\infty p_1(x-y) p_2(y)\,dx \end{aligned}$$
(6.2)

in terms of absolutely continuous densities \(p_1\) and \(p_2\) of independent summands \(X_1\) and \(X_2\) of a random variable \(X\) with density \(p\). Differentiating under the integral sign, we obtain a Radon–Nikodym derivative of the function \(p\),

$$\begin{aligned} p^{\prime }(x) = \int _{-\infty }^\infty p_1^{\prime }(x-y) p_2(y)\,dy = \int _{-\infty }^\infty p_1^{\prime }(y) p_2(x-y)\,dy. \end{aligned}$$
(6.3)

The latter expression shows that \(p^{\prime }\) is absolutely continuous and has a Radon–Nikodym derivative

$$\begin{aligned} p^{\prime \prime }(x) = \int _{-\infty }^\infty p_1^{\prime }(y) p_2^{\prime }(x-y)\,dy, \end{aligned}$$
(6.4)

which is well-defined for all \(x\). In other words, \(p^{\prime \prime }\) appears as the convolution of the functions \(p_1^{\prime }\) and \(p_2^{\prime }\) (which are integrable, according to Proposition 2.2).

These formulas may be used to derive a number of elementary relations within the class \(\mathfrak{P }_k\), and here we shall describe some of them for the cases \(\mathfrak{P }_2\) and \(\mathfrak{P }_3\).

Proposition 6.2

Given a density \(p \in \mathfrak{P }_2(I)\), for all \(x \in \mathbf{R}\),

$$\begin{aligned} |p^{\prime }(x)| \le \,I^{3/4} \sqrt{p(x)} \le I. \end{aligned}$$
(6.5)

Moreover, \(p^{\prime }\) has finite total variation

$$\begin{aligned} \Vert p^{\prime }\Vert _\mathrm{TV} = \int _{-\infty }^\infty |p^{\prime \prime }(x)|\,dx \le I. \end{aligned}$$

The last bound immediately follows from (6.4) and Proposition 2.2. To obtain the pointwise bound on the derivative, we appeal to Proposition 2.1 and rewrite the first equality in (6.3) as

$$\begin{aligned} p^{\prime }(x) = \int _{-\infty }^\infty \frac{p_1^{\prime }(x-y)}{\sqrt{p_1(x-y)}}\ 1_{\{p_1(x-y) > 0\}} \ \sqrt{p_1(x-y)}\, p_2(y)\,dy. \end{aligned}$$

By Cauchy’s inequality,

$$\begin{aligned} p^{\prime }(x)^2&\le I(X_1) \int _{-\infty }^\infty p_1(x-y)\, p_2(y)^2\,dy \\&\le I(X_1)\, \max _y p_2(y)\, \int _{-\infty }^\infty p_1(x-y)\, p_2(y)\,dy \ \le \ I(X_1) I(X_2)^{1/2}\, p(x), \end{aligned}$$

where we applied Proposition 2.2 to the random variable \(X_2\) on the last step. This gives the first inequality in (6.5), while the second follows from \(p(x) \le \sqrt{I}\).

Now, we state similar bounds for the second derivative.

Proposition 6.3

For any density \(p \in \mathfrak{P }_2(I)\;p(x) = 0 \Rightarrow p^{\prime \prime }(x) = 0\), for all \(x\). Moreover,

$$\begin{aligned} \int _{\{p(x)>0\}} \frac{p^{\prime \prime }(x)^2}{p(x)}\,dx \le I^2. \end{aligned}$$

Proof

Let us start with the representation (6.4) for a fixed value \(x \in \mathbf{R}\). By Proposition 2.1, the integral in (6.4) may be restricted to the set \(\{y:p_2(y)>0\}\). By the same reason, it may also be restricted to the set \(\{y:p_1(x-y)>0\}\). Hence,

$$\begin{aligned} p^{\prime \prime }(x) = \int _{-\infty }^\infty p_1^{\prime }(y) p_2^{\prime }(x-y)\,1_A(y)\,dy, \end{aligned}$$
(6.6)

where \(A = \{y: p_1(x-y)p_2(y)>0\}\). On the other hand, \(p(x) = 0\) in the equality (6.2) implies that \(p_1(y) p_2(x-y) = 0\) for almost all \(y\). Therefore, \(1_A(y) = 0\) a.e., and thus the integral (6.6) is vanishing, that is, \(p^{\prime \prime }(x) = 0\).

Next, introduce the functions \(u_i(x) = \frac{p_i^{\prime }(x)}{\sqrt{p_i(x)}}\, 1_{\{p_i(x) > 0\}}\) (\(i = 1,2\)) and rewrite (6.4) as

$$\begin{aligned} p^{\prime \prime }(x) = \int _{-\infty }^\infty \left( u_1(x-y) u_2(y)\right) \, \sqrt{p_1(x-y)p_2(y)} \ dy. \end{aligned}$$

By Cauchy’s inequality,

$$\begin{aligned} p^{\prime \prime }(x)^2 \, \le \, \int _{-\infty }^\infty u_1(x-y)^2\, u_2(y)^2\, dy \, \int _{-\infty }^\infty p_1(x-y)p_2(y) \,dy \, = \, u(x)^2 p(x), \nonumber \\ \end{aligned}$$
(6.7)

where \(u \ge 0\) is defined by

$$\begin{aligned} u(x)^2 = \int _{-\infty }^\infty u_1(x-y)^2\, u_2(y)^2\, dy. \end{aligned}$$
(6.8)

Clearly,

$$\begin{aligned} \int _{-\infty }^\infty u(x)^2\, dx = I(X_1) I(X_2) \le I^2, \end{aligned}$$

which is the inequality of the proposition. \(\square \)

Proposition 6.4

Given a density \(p \in \mathfrak{P }_3(I)\), we have, for all \(x\),

$$\begin{aligned} |p^{\prime \prime }(x)| \le I^{5/4} \sqrt{p(x)} \le I^{3/2}. \end{aligned}$$
(6.9)

Proof

By the assumption, one may write \(p = p_1 * p_2\) with \(p_1 \in \mathfrak{P }_1(I)\) and \(p_2 \in \mathfrak{P }_2(I)\). Returning to (6.7)–(6.8) and applying Proposition 6.2 to \(p_2\), we get \(u_2(y) \le I^{3/4}\), so

$$\begin{aligned} u(x)^2 \le I^{3/2} \int _{-\infty }^\infty u_1(x-y)^2\, dy \le I^{5/2}. \end{aligned}$$

This proves the first inequality in (6.9). The second bound follows from the uniform bound \(p(x) \le \sqrt{I}\), cf. Proposition 2.2. \(\square \)

7 Bounds under moment assumptions

Another way to sharpen the bounds obtained in Sect. 2 for general densities with finite Fisher information is to invoke conditions on the absolute moments

$$\begin{aligned} \beta _s = \beta _s(X) = \mathbf{E}\, |X|^s \quad (s > 0 \ \ \mathrm{real}). \end{aligned}$$

By Proposition 2.1 and Cauchy’s inequality, if the Fisher information is finite,

$$\begin{aligned} \int _{-\infty }^\infty |x|^s\, |p^{\prime }(x)|\,dx&= \int _{\{p(x)>0\}} |x|^s p(x)^{1/2} \ \frac{|p^{\prime }(x)|}{p(x)^{1/2}} \ dx \\&\le \left( \int _{\{p(x)>0\}} |x|^{2s} p(x)\,dx\right) ^{1/2}\, \left( \int _{\{p(x)>0\}} \frac{p^{\prime }(x)^2}{p(x)}\,dx\right) ^{1/2}. \end{aligned}$$

Hence, we arrive at:

Proposition 7.1

If \(X\) has an absolutely continuous density \(p\), then, for any \(s>0\),

$$\begin{aligned} \int _{-\infty }^\infty |x|^s\, |p^{\prime }(x)|\,dx \le \sqrt{\beta _{2s} I(X)}. \end{aligned}$$

This bound holds irrespectively of the Fisher information or the \(2s\)-th absolute moment \(\beta _{2s}\) being finite or not. Below we describe several applications of this proposition.

First, let us note that, when \(s \ge 1\), the function \(u(x) = (1+|x|^s) p(x)\) is (locally) absolutely continuous and has a Radon–Nikodym derivative satisfying

$$\begin{aligned} |u^{\prime }(x)| \le s |x|^{s-1}\,p(x) + (1+|x|^s)\, |p^{\prime }(x)|. \end{aligned}$$

Integrating this inequality and assuming for a moment that both \(I(X)\) and \(\beta _{2s}\) are finite, we see that \(u\) is a function of bounded variation. Since \(u\) is also integrable,

$$\begin{aligned} u(-\infty ) = \lim _{x \rightarrow -\infty } u(x) = 0, \quad u(\infty ) = \lim _{x \rightarrow \infty } u(x) = 0. \end{aligned}$$

Therefore, applying Propositions 2.2 and 7.1, we get

$$\begin{aligned} u(x) \, = \, \int _{-\infty }^x u^{\prime }(y)\,dy&\le \int _{-\infty }^\infty |u^{\prime }(y)|\,dy \\&\le s\int _{-\infty }^\infty |x|^{s-1}\,p(x)\,dx + \int _{-\infty }^\infty (1+|x|^s)\, |p^{\prime }(x)|\,dx \\&\le s \beta _{s-1} + \sqrt{I(X)} + \sqrt{\beta _{2s} I(X)}. \end{aligned}$$

One can summarize.

Corollary 7.2

If \(X\) has density \(p\), then, given \(s \ge 1\), for any \(x \in \mathbf{R}\),

$$\begin{aligned} p(x) \le \frac{C}{1 + |x|^s} \end{aligned}$$

with \(C = s \beta _{s-1} + \sqrt{2(1+\beta _{2s})\, I(X)}\). If this constant is finite, we also have

$$\begin{aligned} \lim _{x \rightarrow \infty } \, (1 + |x|^s)\,p(x) = 0. \end{aligned}$$

In the resulting inequality no requirements on the density are needed.

Under stronger moment assumptions, one can obtain better bounds for the decay of the density. For example, if for some \(\lambda > 0\), the exponential moment

$$\begin{aligned} \beta = \mathbf{E}\, e^{2\lambda |X|} = \int _{-\infty }^\infty e^{2\lambda |x|}\,p(x)\,dx \end{aligned}$$

is finite, then by similar arguments, \(p(x) \le C\,e^{-\lambda |x|}\), for any \(x \in \mathbf{R}\), with some constant \(C\) depending on \(\lambda \;\beta \) and \(I(X)\).

Applying Proposition 7.1 and Corollary 7.2 (the last assertion) with \(s=1\), we obtain the following analogue of Proposition 2.3.

Corollary 7.3

If \(X\) has finite second moment and finite Fisher information \(I(X)\), then for its characteristic function \(f(t) = \mathbf{E}\,e^{itX}\) we have

$$\begin{aligned} |f^{\prime }(t)| \le \frac{C}{|t|}, \quad t \in \mathbf{R}, \end{aligned}$$

with constant \(C = 1 + \sqrt{\beta _2 I(X)}\).

Indeed, if \(p\) is density of \(X\) and \(t \ne 0\), one may integrate by parts

$$\begin{aligned} f^{\prime }(t) = \frac{1}{t} \int _{-\infty }^\infty x p(x)\,d e^{itx} \ = \ - \frac{1}{t} \int _{-\infty }^\infty (p(x) + x p^{\prime }(x))\,e^{itx}\,dx, \end{aligned}$$

which yields \(|tf^{\prime }(x)| \le 1 + \sqrt{\beta _2 I(X)}\).

One can also derive a similar integral bound with the help of Corollary 7.2 with \(s=2\), that is, assuming that \(\beta _4\) is finite. Alternatively (so that to improve the resulting constant), let us repeat the argument used in the proof of Corollary 7.2 with the particular function \(u(x) = x^2 p(x)\). Then we readily get

$$\begin{aligned} x^2 p(x) \le 2\beta _1 + \sqrt{\beta _4 I(X)}. \end{aligned}$$

But, by the Cramér–Rao inequality, \(\beta _4 I(X) \ge \beta _2^2 I(X) \ge \beta _2 \ge \beta _1^2\), and the above estimate is simplified to \(x^2 p(x) \le 3\sqrt{\beta _4 I(X)}\). Hence,

$$\begin{aligned} \int _{-\infty }^\infty x^2 p^{\prime }(x)^2\,dx \, = \, \int _{\{p(x)>0\}} x^2 p(x) \ \frac{p^{\prime }(x)^2}{p(x)} \ dx \, \le \, 3\sqrt{\beta _4 I(X)}\, I(X). \end{aligned}$$

Since \(xp^{\prime }(x)\) represents the inverse Fourier transform for \(-(tf(t))^{\prime }\), one may use the Plancherel formula which leads to:

Corollary 7.4

If \(X\) has finite \(4\)th moment and finite Fisher information \(I(X)\), then

$$\begin{aligned} \int _{-\infty }^\infty |(tf(t))^{\prime }|^2\,dt \, \le \, 6\pi \sqrt{\beta _4}\, I(X)^{3/2}. \end{aligned}$$

The left integral appeared in the bound (5.9) of Corollary 5.3. Combining Proposition 2.4 and Corollary 7.4, (5.9) may be thus complemented by a similar \(I\)-containing bound, namely,

$$\begin{aligned} I(X_1 + X_2 + X_3) \,&\le \, \frac{3}{2}\, \left( \,\,\int _{-\infty }^\infty |t f(t)|^2\,dt \int _{-\infty }^\infty |(tf(t))^{\prime }|^2\,dt\right) ^{1/2}\\&\le \, 3\sqrt{3}\,\pi \beta _4^{1/4}\, I(X_1)^{3/2}, \end{aligned}$$

where random variables \(X_1,X_2,X_3\) are independent and have a common characteristic function \(f(t)\) with finite 4th moment \(\beta _4 = \beta _4(X_1)\).

8 Fisher information in terms of the second derivative

It will be convenient to work with the formulas for the Fisher information and for the parts of corresponding integrals over half-axes, which involve the second derivative of the density. First we consider convolutions of two densities with finite Fisher information.

Proposition 8.1

If a random variable \(X\) has density \(p \in \mathfrak{P }_2\), then

$$\begin{aligned} I(X) = -\int _{-\infty }^\infty p^{\prime \prime }(x)\,\log p(x)\,dx, \end{aligned}$$
(8.1)

provided that

$$\begin{aligned} \int _{-\infty }^\infty |p^{\prime \prime }(x)\,\log p(x)|\,dx < +\infty . \end{aligned}$$
(8.2)

The latter condition holds, if  \(\mathbf{E}\,|X|^s < \infty \) for some \(s > 2\).

Strictly speaking, the integration in (8.1)–(8.2) should be performed over the open set \(G = \{x:p(x)>0\}\). One may extend this integration to the whole real line by using the convention \(0 \log 0 = 0\). This is consistent with the property that \(p^{\prime \prime }(x) = 0\), as soon as \(p(x) = 0\) (according to Proposition 6.3).

Proof

The assumption \(p \in \mathfrak{P }_2\) ensures that \(p\) has an absolutely continuous derivative \(p^{\prime }\) with Radon–Nikodym derivative \(p^{\prime \prime }\). By Proposition 6.2, \(p^{\prime }\) has bounded total variation, which justifies the possibility of integration by parts.

More precisely, assuming that \(p \in \mathfrak{P }_2\), let us decompose the set \(G\) into disjoint open intervals \((a_n,b_n)\), bounded or not. In particular, \(p(a_n) = p(b_n) = 0\), and by the bound (6.5),

$$\begin{aligned} |p^{\prime }(x)\log p(x)| \le \,I^{3/4} \sqrt{p(x)}\, |\log p(x)| \rightarrow 0, \quad \mathrm{as} \ \ x \downarrow a_n, \end{aligned}$$

and similarly for \(b_n\). Integrating by parts, we get for \(a_n < T_1 < T_2 < b_n\),

$$\begin{aligned} \int _{T_1}^{T_2} \frac{p^{\prime }(x)^2}{p(x)}\, dx&= \int _{T_1}^{T_2} p^{\prime }(x)\, d\log p(x) \\&= p^{\prime }(x) \log p(x) \bigg |_{x=T_1}^{T_2} - \int _{T_1}^{T_2} p^{\prime \prime }(x)\,\log p(x)\,dx. \end{aligned}$$

Letting \(T_1 \rightarrow a_n\) and \(T_2 \rightarrow b_n\), we get

$$\begin{aligned} \int _{a_n}^{b_n} \frac{p^{\prime }(x)^2}{p(x)}\, dx = - \int _{a_n}^{b_n} p^{\prime \prime }(x)\,\log p(x)\,dx, \end{aligned}$$

where the second integral is understood in the improper sense. It remains to perform summation over \(n\) on the basis of (8.2), and then we obtain (8.1).

To verify the integrability condition (8.2), one may apply an integral bound of Proposition 6.3. Namely, using Cauchy’s inequality, for the integral in (8.2) we have

$$\begin{aligned} \left( \,\,\int _{\{p(x)>0\}} \frac{|p^{\prime \prime }(x)|}{\sqrt{p(x)}}\ \sqrt{p(x)}\, |\log p(x)|\,dx\right) ^2 \, \le \, I^2 \int _{-\infty }^\infty p(x) \log ^2 p(x)\,dx. \end{aligned}$$

If the moment \(\beta _s = \mathbf{E}\, |X|^s\) is finite, Corollary 7.2 yields

$$\begin{aligned} p(x) \log ^2 p(x) \, \le \, C\, \frac{\log ^2(e + |x|)}{1 + |x|^{s/2}} \end{aligned}$$

with a constant \(C\) depending on \(I\) and \(\beta _s\). The latter function is integrable in case \(s>2\), so the integral in (8.2) is finite. \(\square \)

As the above argument shows, without the requirement that \(p \in \mathfrak{P }_2\) and the integrability condition (8.2), formula (8.1) still remains valid under the following assumptions:

  • \(p(x)\) is twice continuously differentiable on the real line;

  • \(p(x) > 0\), for all \(x\);

  • \(p^{\prime }(x) \log p(x) \rightarrow 0\), as \(|x| \rightarrow \infty \).

However, then the integral (8.1) should be understood in the improper sense, i.e., we have

$$\begin{aligned} I(X) \ = \, - \lim _{T_1 \rightarrow -\infty , \ T_2 \rightarrow \infty } \, \int _{T_1}^{T_2} p^{\prime \prime }(x)\,\log p(x)\,dx, \end{aligned}$$

where the limit exists regardless of whether the Fisher information is finite or not.

In order to involve the standard moment assumption—the finiteness of the second moment, we consider densities representable as convolutions of more than two densities with finite Fisher information.

Proposition 8.2

If a random variable \(X\) has finite second moment and density \(p \in \mathfrak{P }_5\), then condition (8.2) holds, and for all \(-\infty \le a < b \le \infty \),

$$\begin{aligned} \int _a^b \frac{p^{\prime }(x)^2}{p(x)}\,1_{\{p(x)>0\}}\,dx \ \!=\! \ p^{\prime }(b)\log p(b) - p^{\prime }(a)\log p(a) - \int _a^b p^{\prime \prime }(x)\,\log p(x)\,dx. \nonumber \\ \end{aligned}$$
(8.3)

In particular, \(X\) has finite Fisher information given by (8.1).

Here we use the convention \(p^{\prime }(\pm \infty ) \log p(\pm \infty ) = 0\) for the case where \(a\) and/or \(b\) are infinite, together with

$$\begin{aligned} p^{\prime }(x)\,\log p(x) = p^{\prime \prime }(x)\,\log p(x) = 0 \quad \mathrm{in \ the \ case} \ \ p(x) = 0, \end{aligned}$$

as before in (8.1)–(8.2). To show that (8.2) is indeed fulfilled, it will be sufficient to prove the following pointwise bounds which are of independent interest.

Proposition 8.3

If  \(\mathbf{E}X^2 \le 1\) and \(X\) has density \(p \in \mathfrak{P }_5(I)\), then with some absolute constant \(C\), for all \(x\),

$$\begin{aligned} |p^{\prime \prime }(x)| \, \le \, CI^3\, \frac{1}{1 + x^2} \end{aligned}$$
(8.4)

and

$$\begin{aligned} |p^{\prime \prime }(x)\,\log p(x)| \, \le \, CI^3\, \frac{\log (e + |x|)}{1 + x^2}. \end{aligned}$$
(8.5)

Proof

The assumption \(\mathbf{E}X^2 \le 1\) implies \(I \ge 1\) (by Cramer–Rao’s inequality). Also, the characteristic function \(f(t) = \mathbf{E}\, e^{itX}\) is twice differentiable, and by Proposition 2.3, it satisfies

$$\begin{aligned} |f(t)| \le \frac{I^{5/2}}{|t|^5}. \end{aligned}$$

Hence, \(p\) may be described as the inverse Fourier transform

$$\begin{aligned} p(x) = \frac{1}{2\pi } \int _{-\infty }^\infty e^{-itx} f(t)\,dt, \end{aligned}$$

and a similar representation is also valid for the second derivative,

$$\begin{aligned} p^{\prime \prime }(x) = -\frac{1}{2\pi } \int _{-\infty }^\infty e^{-itx}\, t^2 f(t)\,dt. \end{aligned}$$
(8.6)

Write \(X = X_1 + \cdots + X_5\) with independent summands such that \(I(X_j) \le I\) and assume (without loss of generality) that they have equal means. Then \(\mathbf{E}X_j^2 \le 1\), hence the characteristic functions \(f_j(t)\) of \(X_j\) have second derivatives \(|f_j^{\prime \prime }(t)| \le 1\). Moreover, by Proposition 2.3 and Corollary 7.3,

$$\begin{aligned} |f_j(t)| \le \frac{I^{1/2}}{|t|}, \quad |f_j^{\prime }(t)| \le \frac{1 + I^{1/2}}{|t|}. \end{aligned}$$

Now, differentiation of the equality \(f(t) = f_1(t) \ldots f_5(t)\) leads to

$$\begin{aligned} f^{\prime }(t) = f_1^{\prime }(t)\,f_2(t) \ldots f_5(t) + \cdots + f_1(t) \ldots f_4(t)\,f_5^{\prime }(t), \end{aligned}$$

hence \(|f^{\prime }(t)| \le \frac{5 I^2\, (1 + I^{1/2})}{|t|^5}\). Differentiating once more, it should be clear that

$$\begin{aligned} |f^{\prime \prime }(t)| \, \le \, \frac{5 I^2}{t^4} + \frac{20\,I^{3/2} (1 + I^{1/2})^2}{|t|^5}. \end{aligned}$$

These estimates imply that

$$\begin{aligned} |(t^2 f(t))^{\prime }| \le \frac{CI^{5/2}}{|t|^3}, \quad |(t^2 f(t))^{\prime \prime }| \le \frac{CI^{5/2}}{t^2} \quad (|t| \ge 1) \end{aligned}$$

with some absolute constant \(C\). As a consequence, one may integrate in (8.6) by parts with \(x \ne 0\) to get

$$\begin{aligned} p^{\prime \prime }(x) \, = \, \frac{1}{2\pi x^2} \int _{-\infty }^\infty (t^2 f(t))^{\prime \prime }\,e^{-itx}\,dx. \end{aligned}$$

Hence, for all \(x \in \mathbf{R}\),

$$\begin{aligned} |p^{\prime \prime }(x)| \le \frac{CI^{5/2}}{1 + x^2} \end{aligned}$$
(8.7)

with some absolute constant \(C\), implying the required pointwise bound (8.4).

Now, to derive the second pointwise bound, first we recall that \(p(x) \le I^{1/2}\). Hence,

$$\begin{aligned} |\log p(x)| \le \log (I^{1/2}) + \log \frac{I^{1/2}}{p(x)}, \end{aligned}$$
(8.8)

where the last term is thus non-negative. Next, we partition the real line into the sets \(A = \{x: p(x) \le \frac{I^{1/2}}{2(1 + x^4)}\}\) and its complement \(B\). On the set \(A\), by Proposition 6.4,

$$\begin{aligned} |p^{\prime \prime }(x)|\,\log \frac{I^{1/2}}{p(x)} \, \le \, I^{5/4} \sqrt{p(x)}\,\log \frac{I^{1/2}}{p(x)} \, \le \, C_1 I^{3/2}\, \frac{\log (e + |x|)}{1 + x^2}, \end{aligned}$$

and similarly, by (8.7), on the set \(B\) we have an analogous inequality

$$\begin{aligned} |p^{\prime \prime }(x)|\,\log \frac{I^{1/2}}{p(x)} \, \le \, |p^{\prime \prime }(x)|\,\log \left( 2 (1 + x^4)\right) \, \le \, C_2I^{5/2}\, \frac{\log (e + |x|)}{1 + x^2}. \end{aligned}$$

Thus, for all \(x\), applying (8.8) and again (8.7),

$$\begin{aligned} |p^{\prime \prime }(x) \log p(x)|&\le |p^{\prime \prime }(x)| \log (I^{1/2}) + |p^{\prime \prime }(x)| \log \frac{I^{1/2}}{p(x)} \\&\le C I^{5/2}\,(1 + \log I)\, \frac{\log (e + |x|)}{1 + x^2}. \end{aligned}$$

Proposition 8.3 is proved. \(\square \)

Proof of Proposition 8.2

Like in the proof of Proposition 8.1, first one should decompose the open set \(G = \{x \in (a,b): p(x)>0\}\) into disjoint open intervals \((a_n,b_n)\). If \(G = (a,b)\), then for \(a < T_1 < T_2 < b\), we have

$$\begin{aligned} \int _{T_1}^{T_2} \frac{p^{\prime }(x)^2}{p(x)}\, dx = p^{\prime }(x) \log p(x) \bigg |_{x=T_1}^{T_2} - \int _{T_1}^{T_2} p^{\prime \prime }(x)\,\log p(x)\,dx. \end{aligned}$$
(8.9)

In case \(b = \infty \;p(x) \rightarrow 0\), as \(x \rightarrow \infty \), by Corollary 7.2, so that \(p^{\prime }(x)\log p(x) \rightarrow 0\), due to Proposition 6.2. If \(b<\infty \), then \(p^{\prime }(x)\log p(x) \rightarrow p^{\prime }(b)\log p(b)\), as \(x \rightarrow b\), with the limit being zero in case \(p(b) = 0\). A similar conclusion is also true about the point \(a\). Hence, letting \(T_1 \rightarrow a\) and \(T_2 \rightarrow b\) in (8.9), we arrive at the desired equality (8.3). Moreover, the pointwise bound (8.5) confirms that the right integral in (8.3) is absolutely convergent.

If the decomposition of \(G\) contains more than one interval, similar arguments should be applied in every interval \((a_n,b_n)\) with the following remark. If \(b_n < b\), then necessarily \(p(b_n)=0\), so \(p^{\prime }(x)\log p(x) \rightarrow 0\), as \(x \rightarrow b_n\) (and likewise for the end points \(a_n > a\)). Then it will remain to perform summation of the obtained equalities over all \(n\). \(\square \)

9 Normalized sums. Proof of Theorem 1.3

By the definition of classes \(\mathfrak{P }_k\) (\(k = 1,2,\ldots \)), the normalized sum

$$\begin{aligned} Z_n = \frac{X_1 + \cdots + X_n}{\sqrt{n}} \end{aligned}$$

of independent random variables \(X_1,\ldots ,X_n\) with finite Fisher information has density \(p_n\) belonging to \(\mathfrak{P }_k\), as long as \(n \ge k\).

Moreover, if all \(I(X_j) \le I\) for all \(j\), then \(p_n \in \mathfrak{P }_k(2kI)\). Indeed, one can partition the collection \(X_1,\ldots ,X_n\) into \(k\) groups and write \(Z_n = U_1 + \cdots + U_k\) with

$$\begin{aligned} U_i = \frac{1}{\sqrt{n}}\,\sum _{j = i}^{m} X_{(i-1)m + j} \ \ (1 \le i \le k-1), \quad U_k = \frac{1}{\sqrt{n}}\,\sum _{j = (k-1)m + 1}^n X_j, \end{aligned}$$

where \(m = [\frac{n}{k}]\). By Stam’s inequality (6.1), for \(1 \le i \le k-1\)

$$\begin{aligned} \frac{1}{I(U_i)} \, \ge \, \frac{1}{n}\, \sum _{j = i}^{m} \frac{1}{I(X_{(i-1)m + j})} \, \ge \, \frac{m}{nI} \, \ge \, \frac{1}{2kI}, \end{aligned}$$

and similarly \(\frac{1}{I(U_k)} \ge \frac{1}{2kI}\).

Therefore, the previous observations about densities from \(\mathfrak{P }_k\) are applicable to \(Z_n\) with sufficiently large \(n\), as soon as the \(X_j\) have finite Fisher information with a common bound on \(I(X_j)\).

In the i.i.d. case, a similar application of (6.1) also yields \(I(Z_n) \le 2 I(Z_{n_0})\). Here, the factor \(2\) may actually be removed as a consequence of one generalization of Stam’s inequality obtained by Artstein, Ball, Barthe and Naor. It is formulated below as a separate proposition (although for our purposes the weaker inequality is sufficient).

Proposition 9.1

[2] If \((X_n)_{n \ge 1}\) are independent and identically distributed, then \(I(Z_n) \le I(Z_{n_0})\), for all \(n \ge n_0\).

We are now ready to return to Theorem 1.3 and complete its proof.

Proof of Theorem 1.3

Let \((X_n)_{n \ge 1}\) have finite second moment and a common characteristic function \(f_1\). The characteristic function of \(Z_n\) is thus

$$\begin{aligned} f_n(t) = \mathbf{E}\, e^{itZ_n} = f_1\left( \frac{t}{\sqrt{n}}\right) ^n. \end{aligned}$$
(9.1)

\((a) \Rightarrow (b)\), according to Proposition 2.2 applied to \(X = Z_n\).

\((b) \Rightarrow (a)\) and \((c)\). If \(Z_{n_1}\) has density \(p_{n_1}\) of bounded total variation, Proposition 4.1 yields

$$\begin{aligned} I(Z_{3n_1}) = I(p_{3n_1}) \le \frac{3}{2}\,\Vert p_{3n_1}\Vert _{\mathrm{TV}}^2 < \infty . \end{aligned}$$

In particular, \(p_{3n_1}\) has a continuous derivative and finite total variation.

\((c) \Rightarrow (a)\), by the same reason, and thus the conditions \((a)\)\((c)\) are equivalent.

\((a) \Rightarrow (d)\). Assume that \(I(Z_{n_0}) < \infty \), for some fixed \(n_0 \ge 1\). Applying Proposition 2.3 with \(X = Z_{n_0}\), it follows that

$$\begin{aligned} |f_{n_0}(t)| \le \frac{1}{t}\,\sqrt{I(Z_{n_0})}, \quad t > 0. \end{aligned}$$

Hence, \(|f_1(t)| \le Ct^{-\varepsilon }\) with constants \(\varepsilon = \frac{1}{n_0}\) and \(C = \left( I(Z_{n_0})/n_0\right) ^{1/2n_0}\).

\((d) \Rightarrow (e)\) is obvious.

\((e) \Rightarrow (c)\). Differentiating the formula (9.1) and using the integrability assumption (1.8) on \(f_1\), we see that, for all \(n \ge \nu + 2\), the characteristic function \(f_n\) and its first two derivatives are integrable with weight \(|t|\). This implies that \(Z_n\) has a continuously differentiable density

$$\begin{aligned} p_n(x) = \frac{1}{2\pi }\, \int _{-\infty }^\infty e^{-itx} f_n(t)\,dt, \end{aligned}$$
(9.2)

which, by Proposition 5.1, has finite total variation

$$\begin{aligned} \Vert p_n\Vert _{\mathrm{TV}} \, = \, \int _{-\infty }^\infty |p_n^{\prime }(x)|\,dx \, \le \, \frac{1}{2}\, \int _{-\infty }^\infty \left( |tf_n^{\prime \prime }(t)| + 2\,|f_n^{\prime }(t)| + |t f_n(t)|\right) \,dt. \end{aligned}$$

Thus, Theorem 1.3 is proved.

Remark 9.2.

If we assume in Theorem 1.3 finiteness of the first absolute moment of \(X_1\) (rather than the finiteness of the second moment), the statement will remain valid, provided that the integrability condition \((e)\) is replaced with a stronger condition like

$$\begin{aligned} \int _{-\infty }^\infty |f_1(t)|^\nu \,t^2\,dt < \infty , \quad \mathrm{for \ some} \ \ \nu > 0. \end{aligned}$$
(9.3)

In this case, it follows from (9.1) that, for all \(n \ge \nu + 1\), the characteristic function \(f_n\) and its derivative are integrable with weight \(t^2\). Therefore, according to Proposition 5.2, the normalized sum \(Z_n\) has density \(p_n\) with finite total variation

$$\begin{aligned} \Vert p_n\Vert _{\mathrm{TV}} \, \le \, \left( \,\,\int _{-\infty }^\infty |t f_n(t)|^2\,dt \int _{-\infty }^\infty |(tf_n(t))^{\prime }|^2\,dt\right) ^{1/4}. \end{aligned}$$

As a result, we obtain the chain of implications (9.3) \(\Rightarrow (b) \Rightarrow (a) \Rightarrow (d)\). The latter condition ensures that \(p_n\) admits the representation (9.2) and has a continuous derivative for sufficiently large \(n\). That is, we obtain \((c)\).

10 Edgeworth-type expansions

In the sequel, let \((X_n)_{n \ge 1}\) be independent identically distributed random variables with mean \(\mathbf{E}X_1 = 0\) and variance \(\mathrm{Var}(X_1) = 1\). Here we collect some auxiliary results about Edgeworth-type expansions for the distribution functions \(F_n(x) = \mathbf{P}\{Z_n \le x\}\) and the densities \(p_n\) of the normalized sums \(Z_n = (X_1 + \cdots + X_n)/\sqrt{n}\).

We recall that

$$\begin{aligned} \varphi (x) = \frac{1}{\sqrt{2\pi }}\,e^{-x^2/2}, \quad x \in \mathbf{R}, \end{aligned}$$

stands the density of the standard normal law. If the absolute moment \(\beta _s = \mathbf{E}\,|X_1|^s\) is finite for a given integer \(s \ge 2\), define

$$\begin{aligned} \varphi _s(x) = \varphi (x) + \sum _{k=1}^{s-2} q_k(x)\,n^{-k/2} \end{aligned}$$

with the functions \(q_k\) described in the introductory section, i.e.,

$$\begin{aligned} q_k(x) \ = \, \varphi (x)\, \sum H_{k + 2j}(x) \, \frac{1}{r_1!\ldots r_k!}\, \left( \frac{\gamma _3}{3!}\right) ^{r_1} \ldots \left( \frac{\gamma _{k+2}}{(k+2)!}\right) ^{r_k}. \end{aligned}$$
(10.1)

Here, \(H_l\) denotes the Chebyshev-Hermite polynomial of degree \(l \ge 0\) with leading coefficient 1, and the summation is running over all non-negative solutions \((r_1,\ldots ,r_k)\) to the equation \(r_1 + 2 r_2 + \cdots + k r_k = k\) with notation \(j = r_1 + \cdots + r_k\). Put also

$$\begin{aligned} \Phi _s(x) \, = \, \int _{-\infty }^x \varphi _s(y)\,dy \, = \, \Phi (x) + \sum _{k=1}^{s-2} Q_k(x)\,n^{-k/2}. \end{aligned}$$

Similarly to \(q_k\), the functions \(Q_k\) have an explicit description involving the cumulants \(\gamma _3,\ldots ,\gamma _{k+2}\) of \(X_1\), namely,

$$\begin{aligned} Q_k(x) \ = \, -\varphi (x) \sum H_{k + 2j-1}(x) \, \frac{1}{r_1!\ldots r_k!}\, \left( \frac{\gamma _3}{3!}\right) ^{r_1} \ldots \left( \frac{\gamma _{k+2}}{(k+2)!}\right) ^{r_k}, \end{aligned}$$
(10.2)

where the summation is the same as in (10.1), cf. [5] or [20].

The functions \(\varphi _s\) and \(\Phi _s\) are used to approximate the density and the distribution function of \(Z_n\) with error of order smaller than \(n^{-(s-2)/2}\). The following lemma is classical.

Lemma 10.1

Assume that  \(\limsup _{|t| \rightarrow \infty } |f_1(t)| < 1\). If \(\mathbf{E}\,|X_1|^s < \infty \) \((s \ge 2)\), then as \(n \rightarrow \infty \), uniformly over all \(x\)

$$\begin{aligned} (1 +|x|^s) \left( F_n(x) - \Phi _{s}(x)\right) = o\left( n^{-(s-2)/2}\right) . \end{aligned}$$
(10.3)

Actually, the relation (10.3) remains valid for real values \(s \ge 2\), in which case \(\Phi _s\) should be replaced with \(\Phi _{[s]}\). For the range \(2 \le s < 3\) the Cramer condition for the characteristic function is not used, cf. [19]; the range \(s \ge 3\) is treated in [20] (Theorem 2, Ch.VI, p. 168).

We also need to describe the approximation of densities. Recall that \(Z_n\) have the characteristic functions

$$\begin{aligned} f_n(t) = f_1\left( \frac{t}{\sqrt{n}}\right) ^n, \end{aligned}$$

where \(f_1\) is for the characteristic function of \(X_1\). If the Fisher information \(I = I(Z_{n_0})\) is finite, then, by Proposition 2.3,

$$\begin{aligned} |f(t)|^{2n_0} \le \frac{I}{I + n_0 t^2}, \quad t \in \mathbf{R}. \end{aligned}$$
(10.4)

Hence, given \(m \ge 1\), we have a polynomial bound \(|f_n(t)| \le c\,|t|^{-m}\) for \(n \ge m n_0\) and with \(c\) which does not depend on \(t\). So, for all sufficiently large \(n\;Z_n\) have continuous bounded densities

$$\begin{aligned} p_n(x) = \frac{1}{2\pi }\, \int _{-\infty }^\infty e^{-itx} f_n(t)\,dt, \end{aligned}$$

which have continuous derivatives

$$\begin{aligned} p_n^{(l)}(x) = \frac{1}{2\pi }\, \int _{-\infty }^\infty (-it)^l\, e^{-itx} f_n(t)\,dt \end{aligned}$$

of any prescribed order.

Lemma 10.2

Assume that \(I(Z_{n_0}) < \infty \), for some \(n_0\), and let \(\mathbf{E}\, |X_1|^s < \infty \) \((s \ge 2)\). Fix \(l = 0,1,\ldots \) Then, for all sufficiently large \(n\),

$$\begin{aligned} (1 + |x|^s)\, |p_n^{(l)}(x) - \varphi _s^{(l)}(x)| \, \le \, \psi _{l,n}(x)\,\frac{\varepsilon _n}{n^{(s-2)/2}}, \quad x \in \mathbf{R}, \end{aligned}$$
(10.5)

where \(\varepsilon _n \rightarrow 0\), as \(n \rightarrow \infty \), and

$$\begin{aligned} \sup _x \, |\psi _{l,n}(x)| \le 1, \quad \int _{-\infty }^\infty \psi _{l,n}(x)^2\,dx \le 1. \end{aligned}$$

For the proof of Theorem 1.1, the lemma will be used with the values \(l = 0,1,2\), only. In case \(l=0\), this lemma with the first bound \(\sup _x \, |\psi _{l,n}(x)| \le 1\) is a well-known result. It does not need to require the finiteness of Fisher information, but only uses the assumption of the boundedness of \(p_n\) for large \(n\). We can refer to [20], p. 211 in case \(s \ge 3\) and to [20], pp. 198–201 for the case \(s=2\) when \(\varphi _s = \varphi \).

Proof

The result follows from the corresponding approximation of \(f_n\) by the Fourier transforms of \(\varphi _s\) on growing intervals, and here we remind a standard argument. Introduce the “corrected normal characteristic” function

$$\begin{aligned} g_s(t) = e^{-t^2/2} + e^{-t^2/2}\, \sum _{k=1}^{s-2} P_k(it)\, n^{-k/2}, \quad t \in \mathbf{R}, \end{aligned}$$

where

$$\begin{aligned} P_k(it) \ \ = \sum _{r_1 + 2 r_2 + \cdots + k p_k = k} \frac{1}{r_1!\ldots r_k!}\, \left( \frac{\gamma _3}{3!}\right) ^{p_1} \ldots \left( \frac{\gamma _{k+2}}{(k+2)!}\right) ^{p_k} (it)^{k + 2(r_1 + \cdots + r_k)}. \end{aligned}$$

This function may also be defined as the Fourier transform of \(\varphi _s\), i.e.,

$$\begin{aligned} g_s(t) = \int _{-\infty }^\infty e^{itx} \varphi _s(x)\,dx. \end{aligned}$$

Note that \(g_2(t) = e^{-t^2/2}\) in the case \(s=2\).

If \(s \ge 3\), by Lemma 3 in [20], p. 209, in the interval \(|t| \le n^{-1/7}\), or even for \(|t| \le n^{-1/6}\) (cf. e.g. [9, 10], Proposition 9.1), we have

$$\begin{aligned} \left| f_n^{(m)}(t) - g_s^{(m)}(t)\right| \le \frac{\varepsilon _n}{n^{(s-2)/2}}\, \left( |t|^{s-m} + |t|^{2s^2}\right) \,e^{-t^2/2}, \quad m = 0,1,\ldots ,s, \nonumber \\ \end{aligned}$$
(10.6)

where \(\varepsilon _n \rightarrow 0\), as \(n \rightarrow \infty \) (not depending on \(t\)). In case \(s=2\), one only has

$$\begin{aligned} |f_n^{(m)}(t) - g^{(m)}(t)| \le \varepsilon _n e^{-t^2/2}, \quad |t| \le T_n, \ \ m = 0,1,2, \end{aligned}$$
(10.7)

with some \(\varepsilon _n \rightarrow 0\) and \(T_n \rightarrow \infty \), as \(n \rightarrow \infty \) (cf. e.g. [9, 10], Proposition 5.1). On the other hand, on larger intervals \(|t| \le \sqrt{n}\), with some positive constants \(C\) and \(c\), there is a simple subgaussian bound

$$\begin{aligned} \left| f_n^{(m)}(t)\right| \le Ce^{-ct^2} \quad (0 \le m \le s, \ n \ge 2s), \end{aligned}$$
(10.8)

which easily follows from

$$\begin{aligned} |f_1(u)| \le e^{-c_1 u^2}, \quad \left| f_1^{\prime }(u)\right| \le |u|, \quad \left| f_1^{(s)}(u)\right| \le \beta _s \quad (|u| \le 1). \end{aligned}$$

Combining (10.8) with (10.6)–(10.7), we get a unified estimate

$$\begin{aligned} \left| f_n^{(m)}(t) - g_s^{(m)}(t)\right| \le \frac{\varepsilon _n}{n^{(s-2)/2}}\, \,e^{-ct^2}, \quad |t| \le \sqrt{n}, \ \ m = 0,1,\ldots ,s, \end{aligned}$$
(10.9)

with some sequence \(\varepsilon _n \rightarrow 0\) and some \(c>0\) depending on the distribution of \(X_1\), only.

Now, since \(f_n\) is integrable for large \(n\), one may write

$$\begin{aligned} p_n(x) - \varphi _s(x) = \frac{1}{2\pi }\, \int _{-\infty }^\infty e^{-itx}\, (f_n(t) - g_s(t))\,dt. \end{aligned}$$

Moreover, with our assumptions on \(f_1\), one can differentiate this equality \(l\) times and then integrate by parts \(m \le s\) times to get

$$\begin{aligned} (ix)^m \left( p_n^{(l)}(x) - \varphi _s^{(l)}(x)\right) = \frac{1}{2\pi }\, \int _{-\infty }^\infty e^{-itx}\, \frac{d^m}{dt^m}\, \left[ (-it)^l\,(f_n(t) - g_s(t))\right] \,dt. \nonumber \\ \end{aligned}$$
(10.10)

More precisely, by the polynomial differentiation formula, for any \(r=0,1,\ldots ,s\),

$$\begin{aligned} \Big |\frac{d^r}{dt^r}\,f_1(t)^n\Big | \le \beta _r n^r\, |f_1(t)|^{n-r}, \end{aligned}$$

and then, by the Newton binomial formula,

$$\begin{aligned} \Big |\frac{d^m}{dt^m}\,\big [\,t^l f_1(t)^n\big ]\Big |&\le \sum _{r=0}^m \frac{m!}{r!\, (m-r)!}\, |(t^l)^{(r)}| \cdot \beta _{m-r} n^{m-r}\, |f_1(t)|^{n-(m-r)} \\&\le \beta _m n^m l! \sum _{r=0}^{\min (l,m)} \frac{m!}{r!\, (m-r)!}\, |t|^{l-r} \cdot \, |f_1(t)|^{n-(m-r)}. \end{aligned}$$

But \(\sup _{|t| \ge 1} |f_1(t)| = \alpha < 1\), so that, by (10.4), for \(n \ge n_1 = s + (l+2)n_0\), one can write

$$\begin{aligned} |f_1(t)|^{n-(m-r)}&= |f_1(t)|^{(l-r+2)n_0} \cdot |f_1(t)|^{n - (m-r) - (l-r+2)n_0} \\&\le \left( \frac{I}{I + t^2}\right) ^{(l-r+2)/2} \alpha ^{n-n_1}. \end{aligned}$$

Hence, just using \(\frac{t^2}{I + t^2} \le 1\), we have

$$\begin{aligned} \Big |\frac{d^m}{dt^m}\,\big [\,t^l f_1(t)^n\big ]\Big |&\le \beta _m n^m l!\, \alpha ^{n-n_1} \sum _{r=0}^{\min (m,l)} \frac{m!}{r!\, (m-r)!}\, \left( \frac{I t^2}{I + t^2}\right) ^{(l-r)/2} \frac{I}{I + t^2} \\&\le \beta _s (2n I)^m l!\, \alpha ^{n-n_1}\, \frac{I}{I + t^2}. \end{aligned}$$

This estimate easily implies

$$\begin{aligned} \Big |\frac{d^m}{dt^m}\,\big [\,t^l f_n(t)\big ]\Big | \le C\alpha _1^n\, \frac{1}{1 + t^2}, \quad \mathrm{for} \ \ |t| \ge \sqrt{n}, \ \ n \ge n_1, \end{aligned}$$
(10.11)

where the positive constants \(C\) and \(\alpha _1 < 1\) may depend on \(m,l\), and the distribution of \(X_1\), but not on \(t\). In particular, the representation (10.10) is quite justified.

The estimate (10.11) also shows that the part of the integral in (10.10) over the region \(|t| \ge \sqrt{n}\) decays exponentially fast uniformly over all \(x\). As for the interval \(|t| \le \sqrt{n}\), one may use the bound (10.9) in (10.10), so that eventually

$$\begin{aligned} \sup _x |x|^m \left| p_n^{(l)}(x) - \varphi _s^{(l)}(x)\right| \le \frac{\varepsilon _n}{n^{(s-2)/2}}, \quad \varepsilon _n \rightarrow 0. \end{aligned}$$

By the same reasons, we obtain a similar bound for the \(L^2\) norm of the right-hand side of (10.10) as a function of \(x\), by applying Plancherel’s formula. \(\square \)

11 Behaviour of densities not far from the origin

To study the asymptotic behavior of the Fisher information distance

$$\begin{aligned} I(Z_n||Z) = \int _{-\infty }^\infty \frac{(p_n^{\prime }(x) + xp_n(x))^2}{p_n(x)}\ dx, \end{aligned}$$

we split the domain of integration into the interval \(|x| \le T_n\) and its complement. Thus, define

$$\begin{aligned} J_0 = \int _{|x| \le T_n} \frac{(p_n^{\prime }(x) + xp_n(x))^2}{p_n(x)}\ dx \end{aligned}$$

and similarly \(J_1\) for the region \(|x| > T_n\). If \(T_n\) is not too large, the first integral can be treated with the help of Lemma 10.2. Namely, we take

$$\begin{aligned} T_n = \sqrt{(s-2)\log n + s \log \log n + \rho _n} \quad (s>2), \end{aligned}$$
(11.1)

where \(\rho _n \rightarrow \infty \) is a sufficiently slowly growing sequence whose growth is restricted by the decay of the sequence \(\varepsilon _n\) in (10.5). In other words, \([-T_n,T_n]\) represents an asymptotically largest interval, where we can guarantee that the densities \(p_n\) of \(Z_n\) are separated from zero, and moreover, \(\sup _{|x| \le T_n} |\frac{p_n(x)}{\varphi (x)} - 1| \rightarrow 0\). To cover the case \(s=2\), one may put \(T_n = \sqrt{\rho _n}\), where \(T_n \rightarrow \infty \) is a sufficiently slowly growing sequence. With this choice of \(T_n\), an estimation of the integral \(J_1\) can be performed via moderate inequalities.

In this section we focus on \(J_0\) and provide an asymptotic expansion for it with a remainder term which turns out to be slightly better in comparison with the resulting expansion (1.3) of Theorem 1.1.

Lemma 11.1

Let \(s \ge 3\) be an integer. If \(I(Z_{n_0}) < \infty \), for some \(n_0\), then

$$\begin{aligned} J_0 = \frac{c_1}{n} + \frac{c_2}{n^{2}} + \cdots + \frac{c_{[(s-2)/2]}}{n^{[(s-2)/2]}} + o\left( \frac{1}{n^{(s-2)/2}\,(\log n)^{(s-1)/2}}\right) , \end{aligned}$$

where the coefficients \(c_j\) are defined in (1.4).

Proof

Let us adopt the convention to write \(\delta _n\) for any sequence of functions satisfying \(|\delta _n(x)| \le \varepsilon _n n^{-(s-2)/2}\) with \(\varepsilon _n \rightarrow 0\), as \(n \rightarrow \infty \), at least on the intervals \(|x| \le T_n\). For example, the statement of Lemma 10.2 with \(l=0\) may be written as

$$\begin{aligned} p_n(x) = (1 + u_s(x))\varphi (x) + \frac{\delta _n}{1+|x|^s}, \end{aligned}$$
(11.2)

where

$$\begin{aligned} u_s(x) \, = \, \frac{\varphi _s(x)-\varphi (x)}{\varphi (x)} \, = \, \sum _{k=1}^{s-2}\, \frac{q_k(x)}{\varphi (x)} \ \frac{1}{n^{k/2}}. \end{aligned}$$

Combining the lemma with \(l=0\) and \(l=1\), we obtain another representation

$$\begin{aligned} p_n^{\prime }(x) + xp_n(x) = w_s(x) + \frac{\delta _n}{1+|x|^{s-1}}, \end{aligned}$$
(11.3)

where

$$\begin{aligned} w_s(x) \, = \, \sum _{k=1}^{s-2}\, \frac{q^{\prime }_k(x)+xq_k(x)}{n^{k/2}}. \end{aligned}$$

Note that the functions \(u_s\) and \(w_s\) depend on \(n\) as parameter and are getting small for growing \(n\). More precisely, it follows from the definition of \(q_k\) that, for all \(x \in \mathbf{R}\),

$$\begin{aligned} \frac{|w_s(x)|}{\varphi (x)} \, \le \, C_s\frac{1+|x|^{3(s-1)}}{\sqrt{n}} \quad \text{ and } \quad |u_s(x)| \, \le \, C_s\frac{1+|x|^{3(s-2)}}{\sqrt{n}} \end{aligned}$$
(11.4)

with some constants depending on \(s\) and the cumulants of \(X_1\), only. In particular, for \(|x| \le T_n\) and any prescribed \(0 < \varepsilon < \frac{1}{2}\),

$$\begin{aligned} \frac{|w_s(x)|}{\varphi (x)} \, < \, \frac{1}{n^{\frac{1}{2} - \varepsilon }} \quad \text{ and } \quad |u_s(x)| \, < \, \frac{1}{4} \end{aligned}$$
(11.5)

with sufficiently large \(n\). In addition, with a properly chosen sequence \(\rho _n\), we have

$$\begin{aligned} \frac{\delta _n}{T_n^s\,\varphi (T_n)} \, < \, \frac{1}{4}. \end{aligned}$$
(11.6)

Hence, by Lemma 10.2, \(|\frac{p_n(x)}{\varphi (x)} - 1| < \frac{1}{2}\) on the interval \(|x| \le T_n\).

Now, for \(|x| \le T_n\)

$$\begin{aligned} \left( 1+u_s(x)\right) ^{-1} - \left( 1 + u_s(x) + \frac{\delta _n}{(1+|x|^s)\varphi (x)}\right) ^{-1} = \frac{\delta _n}{(1+|x|^s)\varphi (x)}, \end{aligned}$$

and we obtain from (11.2)

$$\begin{aligned} \frac{1}{p_n(x)} \, = \, \frac{1}{(1+u_s(x))\varphi (x)} + \frac{\delta _n}{(1+|x|^s)\varphi (x)^2}. \end{aligned}$$

Combining this with (11.3) and using (11.5), we will be lead to

$$\begin{aligned} \frac{(p_n^{\prime }(x)+xp_n(x))^2}{p_n(x)} = \frac{w_s(x)^2}{(1+u_s(x))\varphi (x)}+\sum _{j=1}^5 r_{nj}(x), \quad |x| \le T_n, \end{aligned}$$

where

$$\begin{aligned} r_{n1}&= \frac{w_s(x)}{(1+|x|^{s-1})\varphi (x)} \ \delta _n, \quad \ \ r_{n2} \ = \ \frac{w_s(x)^2}{(1+|x|^s)\varphi (x)^2} \ \delta _n, \\ r_{n3}&= \frac{w_s(x)}{(1+|x|^{{2s-1}})\varphi (x)^2} \ \delta _n^2, \quad r_{n4} \ = \ \frac{1}{(1+|x|^{2s-2})\varphi (x)} \ \delta _n^2, \\ r_{n5}&= \frac{1}{(1+|x|^{3s-2})\varphi (x)^2} \ \delta _n^3. \end{aligned}$$

Here, according to the left inequality in (11.5), the remainder terms \(r_{n1}(x)\) and \(r_{n2}(x)\) are uniformly bounded on \([-T_n,T_n]\) by \(|\delta _n|\, n^{-1/3}\). A similar bound also holds for \(r_{n3}(x)\), by taking into account (11.6). In addition, integrating by parts, for large \(n\) and with some constants (independent of \(n\)), we have

$$\begin{aligned} \int _{|x|\le T_n}|r_{n4}(x)|\,dx&\le \frac{C\varepsilon _n}{n^{s-2}} \int _1^{T_n} \frac{1}{x^{2s-2}}\,e^{x^2/2}\,dx \\&\le \frac{C^{\prime } \varepsilon _n}{n^{s-2}} \ \frac{1}{T_n^{2s-1}} \, e^{T_n^2/2}= o\left( \frac{1}{T_n^{s - 1}\, n^{(s-2)/2}}\right) . \end{aligned}$$

With a similar argument, the same \(o\)-relation also holds for the integral of \(|r_{n5}(x)|\).

Thus,

$$\begin{aligned} \int _{|x|\le T_n} \frac{(p_n^{\prime }+xp_n)^2}{p_n} \ dx = \int _{|x|\le T_n} \frac{w_s^2}{(1+u_s)\varphi } \ dx + o\left( \frac{1}{T_n^{s-1}n^{(s-2)/2}}\right) . \end{aligned}$$
(11.7)

Now, by Taylor’s expansion around zero, in the interval \(|u|\le \frac{1}{4}\) we have

$$\begin{aligned} \frac{1}{1+u} \, = \, \sum _{k=0}^{s-4}\, (-1)^ku^k + \theta u^{s-3}, \quad |\theta | < 2 \end{aligned}$$

(there are no terms in the sum for \(s=3\)). Hence, with some \(-2 < \theta _n < 2\)

$$\begin{aligned} \int _{|x|\le T_n}\frac{w_s^2}{(1+u_s)\varphi }\,dx \, = \, \sum _{k=0}^{s-4}\, (-1)^k \int _{|x|\le T_n} w_s^2 u_s^k\,\frac{dx}{\varphi } + \theta _n \int _{|x|\le T_n} w_s^2 u_s^{s-3}\,\frac{dx}{\varphi }. \end{aligned}$$

At the expense of a small error, these integrals may be extended to the whole real line. Indeed, for large enough \(n\), by (11.4), we have, for \(k=0,1,\ldots ,s-4\) with some common constant \(C_s\)

$$\begin{aligned} \int _{|x|>T_n} w_s^2\, |u_s|^k\,\frac{dx}{\varphi }\!\le \! \frac{C_s}{n^{(k+2)/2}}\int _{|x|>T_n}\left( 1\!+\!|x|^{(3k+6)(s-1)}\right) \,\varphi (x)\,dx \!=\! o\left( \frac{1}{n^{(s-1)/2}}\right) . \end{aligned}$$

Moreover,

$$\begin{aligned} \int _{-\infty }^\infty w_s^2\, |u_s|^{s-3}\,\frac{dx}{\varphi } \, = \, O\left( \frac{1}{n^{(s-1)/2}}\right) . \end{aligned}$$

Therefore,

$$\begin{aligned} \int _{|x|\le T_n} \frac{w_s^2}{(1+u_s)\varphi }\,dx \, = \, \sum _{k=0}^{s-4} \, (-1)^k \int _{-\infty }^\infty w_s^2 u_s^k\,\frac{dx}{\varphi } + O\left( \frac{1}{n^{(s-1)/2}}\right) . \end{aligned}$$

Inserting this in (11.7), we thus arrive at

$$\begin{aligned} J_0 \, = \, \sum _{k=0}^{s-4} \, (-1)^k \int _{-\infty }^\infty w_s^2 u_s^k\,\frac{dx}{\varphi } + o\left( \frac{1}{T_n^{s-1}n^{(s-2)/2}}\right) . \end{aligned}$$
(11.8)

In the next step, we develop this representation by expressing \(u_s\) and \(w_s\) in terms of \(q_k\) while expanding the sum in (11.8) in powers of \(1/\sqrt{n}\) as

$$\begin{aligned} \sum _{j=2}^{s-2} \, \frac{a_j}{n^{j/2}} + O\left( \frac{1}{n^{(s-1)/2}}\right) . \end{aligned}$$

More precisely, here the coefficients are given by

$$\begin{aligned} a_j \, = \, \sum _{k=2}^{j}\, (-1)^k \int _{-\infty }^\infty (q_{r_1}^{\prime } + xq_{r_1})\, (q_{r_2}^{\prime } + xq_{r_2})\, q_{r_3},\ldots , q_{r_k}\ \frac{dx}{\varphi ^{k - 1}} \end{aligned}$$
(11.9)

with summation over all positive solutions \((r_1,\ldots ,r_k)\) to \(r_1 + \cdots + r_k = j\). Moreover, when \(j\) are odd, the above integrals are vanishing. Indeed, differentiating the equality (10.1) which defines the functions \(q_k\) and using the property \(H_n^{\prime }(x) = n H_{n-1}(x)\;(n \ge ~1)\), we obtain a similar equality

$$\begin{aligned} q_k^{\prime }(x) + xq_k(x) \ = \, \varphi (x)\, \sum (k + 2l)\,H_{k + 2l - 1}(x) \, \frac{1}{r_1!\ldots r_k!}\, \left( \frac{\gamma _3}{3!}\right) ^{r_1} \ldots \left( \frac{\gamma _{k+2}}{(k+2)!}\right) ^{r_k} \nonumber \\ \end{aligned}$$
(11.10)

with summation over all non-negative solutions \((r_1,\ldots ,r_k)\) to \(r_1 + 2 r_2 + \cdots + k r_k = k\), and where \(l = r_1 + \cdots + r_k\). Hence, the integrand in (11.9) represents a linear combination of the functions of the form

$$\begin{aligned} H_{r_1 + 2l_1 - 1}\, H_{r_2 + 2l_2 - 1}\, H_{r_3 + 2l_3} \ldots H_{r_k + 2l_k}\, \varphi . \end{aligned}$$

Note that here the sum of indices is \({\mathrm{mod}\, 2}\) the same as \(j\). We can now apply the following property of the Chebyshev-Hermite polynomials (see [23]). If the sum of indices \(d_1,\ldots , d_k\) is odd, then necessarily

$$\begin{aligned} \int _{-\infty }^{\infty } H_{d_1}(x) \ldots H_{d_k}(x) \, \varphi (x)\,dx = 0. \end{aligned}$$

Hence, \(a_j = 0\), when \(j\) is odd, and putting \(c_j = a_{2j}\), we arrive at the assertion of the lemma. \(\square \)

Remark

In formula (11.9) with \(c_j = a_{2j}\) we perform summation over all integers \(r_l \ge 1\) such that \(r_1 + \cdots + r_k = 2j\). Hence, all \(r_l \le 2j - 1\), and thus the functions \(q_{r_l}\) are determined by the cumulants up to order \(2j+1\). Hence, \(c_j\) represents a polynomial in \(\gamma _3,\ldots ,\gamma _{2j+1}\).

12 Moderate deviations

We now consider the second integral

$$\begin{aligned} J_1 = \int _{|x|>T_n} \frac{(p_n^{\prime }(x)+xp_n(x))^2}{p_n(x)}\,dx \end{aligned}$$

participating in the Fisher information distance \(I(Z_n||Z)\).

Lemma 12.1

Let \(s \ge 3\) be an integer. If \(I(Z_{n_0}) < \infty \), for some \(n_0\), then

$$\begin{aligned} J_1 = o\left( \frac{1}{n^{(s-2)/2}(\log n)^{(s-3)/2}}\right) . \end{aligned}$$

Proof

Write

$$\begin{aligned} J_1 \, \le \, 2J_{1,1} + 2J_{1,2} \, = \, 2\int _{|x|>T_n} \frac{p_n^{\prime }(x)^2}{p_n(x)}\,dx + 2\int _{|x|>T_n} x^2p_n(x)\,dx. \end{aligned}$$
(12.1)

Using Lemma 10.1, we conclude that, for \(s=3,\ldots \),

$$\begin{aligned} J_{1,2} = o\left( \frac{1}{(n\log n)^{(s-2)/2}}\right) . \end{aligned}$$
(12.2)

Indeed, integrating by parts we have

$$\begin{aligned} \int _{T_n}^\infty x^2 p_{n}(x)\,dx \, = \, T_n^2\,(1-F_n(T_n)) + 2\int _{T_n}^\infty x(1-F_n(x)) \,dx. \end{aligned}$$

Recalling the definition of the approximating functions \(\Phi _s\), cf. (10.2), and applying an elementary inequality \(1-\Phi (x) < \frac{1}{x}\,\varphi (x)\) (\(x>0\)), we get from (10.3) that

$$\begin{aligned} T_n^2\, (1-F_n(T_n))&= T_n^2\, (1-\Phi _s(T_n)) + T_n^2\, (\Phi _s(T_n)-F_n(T_n)) \\&\le T_n \varphi (T_n) + C\,\varphi (T_n)\, \sum _{k=1}^{s-2}\, T_n^{3k} n^{-k/2} + o\left( \frac{1}{T_n^{s-2}\, n^{(s-2)/2}}\right) \\&= o\left( \frac{1}{(n \log n)^{(s-2)/2}}\right) \end{aligned}$$

with some constant \(C\). In addition,

$$\begin{aligned} \int _{T_n}^\infty x(1-F_n(x)) \,dx&\le 1-\Phi (T_n) + C \sum _{k=1}^{s-2}\frac{1}{n^{k/2}}\int _{T_n}^\infty x^{3k}\varphi (x)\,dx \\&+ \, o\left( \frac{1}{T_{n}^{s-2}n^{(s-2)/2}}\right) \ = \ o\left( \frac{1}{(n \log n)^{(s-2)/2}}\right) . \end{aligned}$$

With similar estimates for the half-axis \(x<-T_n\), we arrive at the relation (12.2).

Let us now estimate \(J_{1,1}\). Denote by \(J_{1,1}^+\) the part of this integral corresponding to the interval \(x > T_n\). By Propositions 8.2 with \(a = T_n\;b = \infty \), for sufficiently large \(n\) we have the formula

$$\begin{aligned} J_{1,1}^+ = - p_n^{\prime }(T_n) \log p_n(T_n) - \int _{T_n}^\infty p_n^{\prime \prime }(x)\log p_n(x)\,dx. \end{aligned}$$
(12.3)

Since \(p_n(x) \le \sqrt{I(Z_{n_0})}\), for all \(x\) (cf. Propositions 2.2 and 9.1) and since, by Lemma 10.2, \(p_n(T_n) \ge \frac{1}{2}\,\varphi (T_n)\), we see that, for all sufficiently large \(n\;|\log p_n(T_n)|\le c T_n^2\) with some constant \(c\). Therefore, by Lemma 10.2 for the derivative of the density \(p_n\), we get

$$\begin{aligned} |p_n^{\prime }(T_n) \log p_n(T_n)|&\le c T_n^2\, |p_n^{\prime }(T_n)| \nonumber \\&\le cT_n^2\, |\varphi ^{\prime }(T_n)| + o\left( \frac{1}{T_n^{s-2}\, n^{(s-2)/2}}\right) = o\left( \frac{1}{T_n^{s-3}\, n^{(s-2)/2}}\right) . \nonumber \\ \end{aligned}$$
(12.4)

A similar relation holds at the point \(-T_n\), as well.

It remains to evaluate the integral in (12.3). First we integrate over the set \(A = \{x > T_n: p_n(x)\le \varphi (x)^4\}\). By the upper bound of Proposition 6.4 and applying Proposition 9.1 once more, we have, for all \(x\) and all sufficiently large \(n\),

$$\begin{aligned} |p_n^{\prime \prime }(x)| \, \le \, I(p_n)^{5/4} \sqrt{p_n(x)} \, \le \, I(Z_{n_0})^{5/4} \sqrt{p_n(x)}. \end{aligned}$$

Hence, with some constants \(c,c^{\prime }\)

$$\begin{aligned} \int _A |p_n^{\prime \prime }(x)\log p_n(x)|\,dx&\le c \int _A\sqrt{p_n(x)}\, |\log p_n(x)|\,dx \\&\le c^{\prime }\int _{T_n}^\infty x^2\varphi (x)^2\,dx \, = \, o\left( \frac{1}{n^{s-2}}\right) . \end{aligned}$$

On the other hand, for the complementary set \(B = (T_n,\infty ) \setminus A\), we have

$$\begin{aligned} \int _B |p_n^{\prime \prime }(x)\log p_n(x)|\,dx \, \le \, c\int _B x^2\, |p_n^{\prime \prime }(x)|\,dx. \end{aligned}$$
(12.5)

We now apply Lemma 10.2 to approximate the second derivative. It yields

$$\begin{aligned} \int _{T_n}^{+\infty } x^2\, |p_n^{\prime \prime }(x)|\,dx \, \le \, \int _{T_n}^{+\infty } x^2\, |\varphi _s^{\prime \prime }(x)|\,dx + \int _{T_n}^\infty \frac{|\psi _{2,n}(x)|}{1+|x|^{s-2}}\,dx \cdot o\left( \frac{1}{n^{(s-2)/2}}\right) . \end{aligned}$$

Here, the first integral on the right-hand side is bounded by

$$\begin{aligned} \int _{T_n}^\infty x^2\, |\varphi _s^{\prime \prime }(x)-\varphi ^{\prime \prime }(x)|\,dx + \int _{T_n}^\infty x^2\, |x^2-1|\,\varphi (x)\,dx = o\left( \frac{1}{T_n^{s-3}n^{(s-2)/2}}\right) . \end{aligned}$$

To estimate the second integral, we use Cauchy’s inequality, which gives

$$\begin{aligned} \int _{T_n}^\infty \frac{1}{1 + |x|^{s-2}} \, |\psi _{2,n}(x)|\,dx \ \le \ \frac{1}{T_n^{s-5/2}} \ \left( \int _{\,\,-\infty }^\infty \psi _{2,n}(x)^2\,dx\right) ^{1/2} \ \le \ \frac{1}{T_n^{s-5/2}}. \end{aligned}$$

Therefore, returning to (12.5), we get

$$\begin{aligned} \int _B |p_n^{\prime \prime }(x)\log p_n(x)|\,dx \, = \, o\left( \frac{1}{n^{(s-2)/2}\,(\log n)^{(s-3)/2}}\right) . \end{aligned}$$

Together with the bound for the integral over the set \(A\), we thus have

$$\begin{aligned} J_{1,1}^+ = o\left( \frac{1}{n^{(s-2)/2}\,(\log n)^{(s-3)/2}}\right) . \end{aligned}$$

The part of the integral \(J_{1,1}\) taken over the axis \(x < -T_n\) admits a similar bound, hence the lemma is proved. \(\square \)

The statement of Theorem 1.1 in case \(s \ge 3\) thus follows from Lemmas 11.1 and 12.1.

13 Theorem 1.1 in the case \(s=2\) and Corollary 1.2

In the most general case \(s=2\) the proof of Theorem 1.1 does no need Edgeworth-type expansions. With tools developed in the previous sections the argument is straightforward and may be viewed as an alternative approach to Barron–Johnson’s theorem.

Proof of Theorem 1.1

(case \(s=2\)) Once the Fisher information \(I(Z_{n_0})\) is finite, the normalized sums \(Z_n\) with \(n \ge 2n_0\) have uniformly bounded densities \(p_n\) with bounded continuous derivatives \(p_n^{\prime }\) (Proposition 6.2). Moreover, we have a well-known local limit theorem for densities; we described one of its variants in Lemma 10.2. In particular,

$$\begin{aligned} \sup _x\ (1+x^2)\,|p_n(x)-\varphi (x)|&= o(1),\end{aligned}$$
(13.1)
$$\begin{aligned} \sup _x\ (1+x^2)\,|p_n^{\prime }(x)-\varphi ^{\prime }(x)|&= o(1), \end{aligned}$$
(13.2)

as \(n \rightarrow \infty \), where the convergence of the derivatives relies upon the finiteness of the Fisher information.

Splitting the integration in

$$\begin{aligned} I(Z_n||Z) = \int _{-\infty }^\infty \frac{(p_n^{\prime }(x) + xp_n(x))^2}{p_n(x)}\ dx \end{aligned}$$

into the two regions, we have therefore, for every fixed \(T>1\),

$$\begin{aligned} J_0 \, = \, \int _{|x|\le T}\frac{(p_n^{\prime }(x)+xp_n(x))^2}{p_n(x)}\,dx \, = \,o(1), \quad n\rightarrow \infty . \end{aligned}$$
(13.3)

On the other hand, write as we did before

$$\begin{aligned} J_1&= \int _{|x|>T}\frac{(p_n^{\prime }(x)+xp_n(x))^2}{p_n(x)}\,dx \, \le \, 2 J_{1,1} + 2J_{1,2} \\&= 2\int _{|x|>T}\frac{p_n^{\prime }(x)^2}{p_n(x)}\,dx + 2\int _{|x|>T} x^2 p_n(x)\,dx. \end{aligned}$$

As we saw in (12.3),

$$\begin{aligned} J_{1,1} = -p_n^{\prime }(T)\log p_n(T) + p_n^{\prime }(-T)\log p_n(-T)- \int _{|x|>T} p_n^{\prime \prime }(x)\log p_n(x)\,dx. \end{aligned}$$

By (13.1)–(13.2), \(|p_n^{\prime }(\pm T)\log p_n(\pm T)|\le 2T^3e^{-T^2/2}\) for all sufficiently large \(n \ge n_T\). By Proposition 8.3, with some constant \(c\), for all \(x\),

$$\begin{aligned} u |p_n^{\prime \prime }(x) \log p_n(x)| \, \le \, c\, \frac{\log (e+|x|)}{1+x^2}, \end{aligned}$$

implying

$$\begin{aligned} \int _{|x|>T} |p_n^{\prime \prime }(x)\log p_n(x)|\,dx \, \le \, c^{\prime } T^{-1/2} \end{aligned}$$

with some other constant \(c^{\prime }\). In addition, by (13.1),

$$\begin{aligned} \int _{|x|>T} x^2 p_n(x)\,dx&= \int _{|x|>T} x^2 (p_n(x) - \varphi (x))\,dx + \int _{|x|>T} x^2 \varphi (x)\,dx \\&= -\int _{|x| \le T} x^2 (p_n(x) - \varphi (x))\,dx + \int _{|x|>T} x^2 \varphi (x)\,dx \\&\le \int _{|x| \le T} x^2\, |p_n(x) - \varphi (x)|\,dx + \int _{|x|>T} x^2 \varphi (x)\,dx \ \le \ 2T^3\, o(1) + 4T\varphi (T). \end{aligned}$$

Hence, given \(\varepsilon >0\), one can choose \(T\) such that \(J_1 < \varepsilon \), for all \(n\) large enough. This means that \(J_1 = o(1)\), and recalling (13.3), we get \(I(Z_n||Z) = o(1)\). \(\square \)

Let us now return to the case \(s \ge 3\).

Proof of Corollary 1.2

According to the expansion (11.8) which appeared in the proof of Lemma 11.1, Theorem 1.1 may equivalently be formulated as

$$\begin{aligned} I(Z_n||Z) \, = \, \sum _{l=0}^{s-4} \, (-1)^l \int _{-\infty }^\infty w_s(x)^2 u_s(x)^l\,\frac{dx}{\varphi (x)} + o\left( \frac{1}{n^{(s-2)/2} \, (\log n)^{(s-3)/2}}\right) , \nonumber \\ \end{aligned}$$
(13.4)

where as before

$$\begin{aligned} w_s(x)= \sum _{j=1}^{s-2}\, (q^{\prime }_j(x)+xq_j(x))\,n^{-j/2}, \quad u_s(x)=\sum _{j=1}^{s-2}\, \frac{q_j(x)}{\varphi (x)} \, n^{-j/2}. \end{aligned}$$

This representation for the Fisher information distance is more convenient for applications such as Corollary 1.2 in comparison with (1.3). Assume that \(s \ge 4\) and \(\gamma _3 = \cdots = \gamma _{k-1} = 0\) for a given integer \(3 \le k \le s\) (with no restriction when \(k = 3\)). Then, by the definition (10.2), \(q_1 = \cdots = q_{k-3} = 0\), so

$$\begin{aligned} w_s(x) \, = \, \sum _{j=k-2}^{s-2}\, (q^{\prime }_j(x)+xq_j(x))\,n^{-j/2}, \quad u_s(x) \, = \, \sum _{j=k-2}^{s-2}\, \frac{q_j(x)}{\varphi (x)} \, n^{-j/2}. \nonumber \\ \end{aligned}$$
(13.5)

Hence, in order to isolate the leading term in (1.3) with the smallest power of \(1/n\), one should take \(l = 0\) in (13.4) and \(j = k-2\) in the first sum of (13.5). This gives

$$\begin{aligned} I(Z_n||Z)&= n^{-(k-2)} \int _{-\infty }^\infty \left( q^{\prime }_{k-2}(x) + x q_{k-2}(x)\right) ^2\,\frac{dx}{\varphi (x)} \\&+ O\left( n^{-(k-1)}\right) + o\left( \frac{1}{n^{(s-2)/2} \, (\log n)^{(s-3)/2}}\right) . \end{aligned}$$

Now, again according to (10.2), or as found in (11.10),

$$\begin{aligned} q^{\prime }_{k-2}(x) + x q_{k-2}(x) = \frac{\gamma _k}{(k-1)!}\,H_{k-1}(x)\, \varphi (x). \end{aligned}$$

Therefore, the sum in (1.3) will contain powers of \(1/n\) starting from \(1/n^{k-2}\) with leading coefficient

$$\begin{aligned} c_{k-2} = \frac{\gamma _k^2}{(k-1)!^{\,2}}\,\int _{-\infty }^\infty H_{k-1}(x)^2\, \varphi (x)\,dx = \frac{\gamma _k^2}{(k-1)!}. \end{aligned}$$

Thus, \(c_1 = \cdots = c_{k-3} = 0\) and we get

$$\begin{aligned} I(Z_n||Z) \, = \, \frac{\gamma _k^2}{(k-1)!}\, \frac{1}{n^{k-2}} + O\left( n^{-(k-1)}\right) + o\left( \frac{1}{n^{(s-2)/2} \, (\log n)^{(s-3)/2}}\right) . \end{aligned}$$

\(\square \)

14 Extensions to non-integer \(s\). Lower bounds

If \(s \ge 2\) is not necessary integer, put \(m=[s]\) (integer part). Theorem 1.1 admits the following generalization. As before, let the normalized sums

$$\begin{aligned} Z_n = \frac{X_1 + \cdots + X_n}{\sqrt{n}} \end{aligned}$$

be defined for independent identically distributed random variables with mean \(\mathbf{E}X_1=0\) and variance \(\mathrm{Var}(X_1)=1\).

Theorem 14.1

If \(I(Z_{n_0}) < \infty \) for some \(n_0\), and \(\mathbf{E}\, |X_1|^s < \infty \) \((s>2)\), then

$$\begin{aligned} I(Z_n||Z) = \frac{c_1}{n} + \frac{c_2}{n^{2}} + \cdots + \frac{c_{[(s-2)/2]}}{n^{[(s-2)/2]}} + o\left( \frac{1}{n^{(s-2)/2}\,(\log n)^{(s-3)/2}}\right) , \end{aligned}$$
(14.1)

where the coefficients \(c_j\) are the same as in (1.4).

The proof is based on a certain extension and refinement of the local limit theorem described in Lemma 10.2.

Lemma 14.2

Assume that \(I(Z_{n_0}) < \infty \) for some \(n_0\), and let \(\mathbf{E}\, |X_1|^s < \infty \) \((s \ge 2)\). Fix \(l = 0,1,\ldots \) Then for all \(n\) large enough, \(Z_n\) have densities \(p_n\) of class \(C^l\) satisfying, as \(n \rightarrow \infty \),

$$\begin{aligned} (1+|x|^m)\, \left( p_n^{(l)}(x)-\varphi _m^{(l)}(x)\right) \, = \, \psi _{l,n}(x)\,o(n^{-(s-2)/2}), \quad m = [s], \end{aligned}$$
(14.2)

uniformly for all \(x\), with  \(\sup _x\, |\psi _{l,n}(x)|\le 1\) and \(\int _{-\infty }^\infty \psi _{l,n}(x)^2\, dx \le 1\). Moreover,

$$\begin{aligned} (1+|x|^s)\,\left( p_n^{(l)}(x)-\varphi _m^{(l)}(x)\right)&= \psi _{l,n,1}(x)\,o(n^{-(s-2)/2}) \nonumber \\&\!+\! \ (1+|x|^{s-m})\,\psi _{l,n,2}(x)\, \left( O(n^{-(m-1)/2}) \!+\! o(n^{-(s-2)})\right) , \nonumber \\ \end{aligned}$$
(14.3)

uniformly for all \(x\), where  \(\sup _x\, |\psi _{l,n,j}(x)| \le 1\) and \(\int _{-\infty }^\infty \psi _{l,n,j}(x)^2\, dx \le 1\) \((j=1,2)\).

Here we use the approximating functions \(\varphi _m = \varphi + \sum _{k=1}^{m-2} q_k\, n^{-k/2}\) as before.

When \(l = 0\) and in a simpler form, namely, with \(\psi _{l,s,j}(x,n) = 1\), this result has recently been obtained in [9, 10]. In this case, the finiteness of the Fisher information may be relaxed to the boundedness of the densities. The more general case involving derivatives can be carried out by a similar analysis as that developed in [9, 10], so we omit details.

If \(s=m\) is integer, the Edgeworth-type expansions (14.2) and (14.3) coincide, and we are reduced to the statement of Lemma 10.2. However, if \(s>m\), (14.3) gives an improvement over (14.2) on relatively large intervals such as \(|x| \le T_n\) defined in (11.1).

Proof of Theorem 14.1

With a few modifications one can argue in the same way as we did in the proof of Theorem 1.1. First, in case \(l=0\) (14.3) yields, uniformly in \(|x| \le T_n\)

$$\begin{aligned} p_n(x) \, = \, \varphi _m(x) + \frac{1}{1+|x|^s}\, o\left( n^{-(s-2)/2}\right) , \end{aligned}$$

which being combined with a similar relation for the derivative \((l=1)\) yields

$$\begin{aligned} p_n^{\prime }(x) + x p_n(x) \, = \, w_m(x) + \frac{1}{1+|x|^{s-1}}\, o\left( n^{-(s-2)/2}\right) , \end{aligned}$$

where \(w_m(x) = \sum _{k=1}^{m-2}\, (q^{\prime }_k(x)+xq_k(x))\,n^{-k/2}\). These two relations thus extend (11.2) and (11.3) which were only needed in the proof of Lemma 11.1. Repeating the same arguments using the functions \(u_m(x) = \frac{\varphi _m(x)-\varphi (x)}{\varphi (x)}\), we can extend the expansion of Lemma 11.1 with the same remainder term to general values \(s > 2\).

In order to prove Lemma 12.1 with real \(s>2\), let us return to (12.1). The fact that the relation (12.2) extends to non-integer \(s\) follows from the extended variant of Lemma 10.1, which was already mentioned before. Thus our main concern has to be the integral \(J_{1,1}\) which is responsible for the most essential contribution in the resulting remainder term. Thus, consider the part of this integral on the positive half-axis

$$\begin{aligned} J_{1,1}^+ = \int _{T_n}^\infty \frac{p_n^{\prime }(x)^2}{p_n(x)}\,dx = - p_n^{\prime }(T_n) \log p_n(T_n) - \int _{T_n}^\infty p_n^{\prime \prime }(x)\log p_n(x)\,dx. \end{aligned}$$
(14.4)

Applying (14.3) at \(x=T_n\), we obtain (12.4) for real \(s>2\), that is,

$$\begin{aligned} \left| p_n^{\prime }(T_n) \log p_n(T_n)\right| = o\left( \frac{1}{n^{(s-2)/2}\, (\log n)^{(s-3)/2}}\right) . \end{aligned}$$

To prove (14.1), it remains to estimate the last integral in (14.4) which has to be treated with an extra care. The argument uses both (14.2) and (14.3) which are applied on different parts of the half-axis \(x>T_n\). For the set \(A = \{x \ge T_n: p_n(x) \le \varphi (x)^4\}\) we have already obtained a general relation

$$\begin{aligned} \int _A |p_n^{\prime \prime }(x)\log p_n(x)|\,dx \, = \, o\left( \frac{1}{n^{s-2}}\right) , \end{aligned}$$

which holds for all sufficently large \(n\) (without any moment assumption). Hence, with some constant \(c\)

$$\begin{aligned} \int _{T_n}^{4T_n^4} |p_n^{\prime \prime }(x)\log p_n(x)|\,dx \, \le \, c\int _{T_n}^{4T_n^4} x^2\, |p_n^{\prime \prime }(x)|\,dx + o\left( \frac{1}{n^{s-2}}\right) . \end{aligned}$$
(14.5)

Now, on the interval \([T_n,4T_n^4]\) we apply Lemma 14.2 with \(l=2\) to approximate the second derivative. It yields

$$\begin{aligned} \int _{T_n}^{4T_n^4} x^2\, |p_n^{\prime \prime }(x)|\,dx&\le \int _{T_n}^\infty x^2\, |\varphi _m^{\prime \prime }(x)|\,dx \!+\! \int _{T_n}^{4T_n^2} \frac{|\psi _{2,n,1}(x)|}{1\!+\!|x|^{s-2}}\,dx \cdot o\left( \frac{1}{n^{(s-2)/2}}\right) \\&+ \int _{T_n}^{4T_n^4}\frac{1}{1\!+\!|x|^{m-2}}\,|\psi _{2,n,2}(x)|\,dx \cdot \left( O(n^{-(m-1)/2}) \!+\! o(n^{-(s-2)})\right) . \end{aligned}$$

Here, as in the proof of Lemma 12.1, the first integral on the right-hand side is bounded, up to a constant, by

$$\begin{aligned} \int _{T_n}^{+\infty } x^4\varphi (x)\,dx = o\left( \frac{1}{T_n^{s-3}n^{(s-2)/2}}\right) , \end{aligned}$$

and for the second one, we use Cauchy’s inequality to estimate it by \(T_n^{-(s-5/2)}\). Similarly, the last integral is bounded by

$$\begin{aligned} 2 T_n^2\,\left( \int _{-\infty }^\infty \psi _{2,n,2}(x)^2\,dx\right) ^{1/2} \, \le \, 2T_n^2. \end{aligned}$$

Since \(T_n^2\) has a logarithmic growth, we conclude that

$$\begin{aligned} \int _{T_n}^{4T_n^4} x^2\, |p_n^{\prime \prime }(x)|\,dx = o\left( \frac{1}{n^{(s-2)/2}\,(\log n)^{(s-3)/2}}\right) , \end{aligned}$$

so a similar bound also holds for the left integral in (14.5).

To deal with the remaining values of \(x\), we will consider the set \(S_1 = \big \{x > 4T_n^4: p_n(x) \le \frac{1}{2}\,e^{-4\sqrt{x}}\,\big \}\) and its complement \(S_2 = (4T_n^4,\infty ){\setminus }S_1\). By Proposition 6.3, for all sufficiently large \(n\), and with some constants \(c,c^{\prime }\) we have

$$\begin{aligned} \int _{S_1} |p_n^{\prime \prime }(x)\log p_n(x)|\,dx&\le c \int _{S_1} \sqrt{p_n(x)}\, |\log p_n(x)|\,dx \\&\le c^{\prime } \int _{4T_n^4}^\infty \sqrt{x}\, e^{-2\sqrt{x}}\,dx =o\left( \frac{1}{n^{s-2}}\right) . \end{aligned}$$

On the other hand, applying (14.2) on the set \(S_2\), we get

$$\begin{aligned} \int _{S_2} |p_n^{\prime \prime }(x)\log p_n(x)|\,dx|&\le c \int _{S_2} |p_n^{\prime \prime }(x)| \sqrt{x}\,dx \\&\le c^{\prime } \int _{4T_n^4}^\infty x^{5/2} \varphi (x)\,dx + c^{\prime } \int _{4T_n^4}^\infty \frac{dx}{x^{m-1/2}} \cdot o\left( \frac{1}{n^{(s-2)/2}}\right) \\&= o\left( \frac{1}{T_n^{2(2m-3)} n^{(s-2)/2}}\right) . \end{aligned}$$

Combining the two estimates, the theorem is proved. \(\square \)

Remark.

If \(2<s<4\), the expansion (14.1) becomes

$$\begin{aligned} I(Z_n||Z) = o\left( \frac{1}{n^{(s-2)/2}\,(\log n)^{(s-3)/2}}\right) . \end{aligned}$$
(14.6)

This formulation does not include the case \(s=2\). In case \(s>2\), we expect that the bound (14.6) may be improved further. However, a possible improvement may concern the power of the logarithmic term, only. This can be illustrated by means of the example of densities of the form

$$\begin{aligned} p(x) = \int _{\sigma _0}^\infty \varphi _\sigma (x)\,dP(\sigma ) \quad (x \in \mathbf{R}), \end{aligned}$$

that is, mixtures of densities of normal distributions on the line with mean zero, where \(P\) is a (mixing) probability measure supported on the half-axis \((\sigma _0,\infty )\) with \(\sigma _0 > 0\). The variance constraint on \(P\) is that

$$\begin{aligned} \int _{-\infty }^\infty x^2 p(x)\,dx = \int _{\sigma _0}^\infty \sigma ^2\,dP(\sigma ) = 1, \end{aligned}$$
(14.7)

so we should assume that \(0 < \sigma _0 < 1\).

First, let us note that, by the convexity of the Fisher information,

$$\begin{aligned} I(p) \le \int _{\sigma _0}^\infty I(\varphi _\sigma )\,dP(\sigma ) = \int _{\sigma _0}^\infty \frac{1}{\sigma ^2}\,dP(\sigma ) \le \frac{1}{\sigma _0^2}, \end{aligned}$$

hence, \(I(p)\) is finite. On the other hand, given \(\eta > s/2\), it is possible to construct the measure \(P\) to satisfy (14.7) and with

$$\begin{aligned} D(Z_n||Z)\, \ge \, \frac{c}{n^{(s-2)/2} \, (\log n)^\eta }, \end{aligned}$$

for all \(n\) large enough, and with a constant \(c\) depending on \(s\) and \(\eta \), only (cf. [11]). For example, one may define \(P\) on the half-axis \([2,\infty )\) by its density

$$\begin{aligned} \frac{dP(\sigma )}{d\sigma } = \frac{c}{\sigma ^{s+1} (\log \sigma )^\eta }, \quad \sigma > 2, \end{aligned}$$

and then extend it to any interval \([\sigma _0,2]\) in an arbitrary way so that to obtain a probability measure satisfying the requirement (14.7). Hence, (14.6) is sharp up to a logarithmic factor.

Finally, let us mention that in case \(s=2\;D(Z_n||Z)\) and therefore \(I(Z_n||Z)\) may decay at an arbitrary slow rate.