Brought to you by:
Paper

Convergence rates in expectation for Tikhonov-type regularization of inverse problems with Poisson data

and

Published 1 October 2012 © 2012 IOP Publishing Ltd
, , Citation Frank Werner and Thorsten Hohage 2012 Inverse Problems 28 104004 DOI 10.1088/0266-5611/28/10/104004

0266-5611/28/10/104004

Abstract

In this paper, we study a Tikhonov-type method for ill-posed nonlinear operator equations g† = F(u†), where g† is an integrable, non-negative function. We assume that data are drawn from a Poisson process with density tg†, where t > 0 may be interpreted as an exposure time. Such problems occur in many photonic imaging applications including positron emission tomography, confocal fluorescence microscopy, astronomic observations and phase retrieval problems in optics. Our approach uses a Kullback–Leibler-type data fidelity functional and allows for general convex penalty terms. We prove convergence rates of the expectation of the reconstruction error under a variational source condition as t both for an a priori and for a Lepskii-type parameter choice rule.

Export citation and abstract BibTeX RIS

1. Introduction

We consider inverse problems where the ideal data can be interpreted as a photon density $g^\dagger \in \mathbf {L}^1(\mathbb {M})$ on some manifold $\mathbb {M}$. The unknown will be described by an element u† of a subset $\mathfrak {B}$ of a Banach space $\mathcal {X}$, and u† and g† are related by a forward operator F mapping from $\mathfrak {B}$ to $\mathbf {L}^1(\mathbb {M})$:

Equation (1)

The data will be drawn from a Poisson process with density tg†, where t > 0 can often be interpreted as an exposure time. We refer to section 2 for the definition of a Poisson process and the motivation of this model. Here, we only point out that Poisson data can be seen as a random collection of points on the manifold $\mathbb {M}$ on which measurements are taken. Hence, unlike the common deterministic setup the data do not belong to the same space as the ideal data g†. Note that the exposure time t is proportional to the expected number of collected points. Thus, the larger the t, the more accurate our information becomes on g† and hence on u†, and we will study the limit t.

Such inverse problems occur naturally in photonic imaging since photon count data are Poisson distributed for fundamental physical reasons. Examples include inverse problems in astronomy [4], fluorescence microscopy, in particular 4Pi microscopy [28], coherent x-ray imaging [18] and positron emission tomography [8].

In this paper, we study a penalized likelihood or Tikhonov-type estimator

Equation (2)

Here Gt describes the observed data, $\mathcal {S}$ is a Kullback–Leibler-type data misfit functional derived in section 2, α > 0 is a regularization parameter and $\mathcal {R}:\mathcal {X}\rightarrow \left(-\infty , \infty \right]$ is a convex penalty term, which may incorporate a priori knowledge about the unknown solution u†. If $\mathcal S\left(g_1;g_2\right) = \Vert g_1-g_2\Vert _{\mathcal {Y}}^2$ and $\mathcal {R}(u) = \Vert u-u_0\Vert _{\mathcal {X}}^2$ with Hilbert space norms $\Vert \cdot \Vert _{\mathcal {X}}$ and $\Vert \cdot \Vert _{\mathcal {Y}}$, then (2) is a standard Tikhonov regularization. In many cases, estimators of the form (2) can be interpreted as maximum a posteriori estimators in a Bayesian framework, but our convergence analysis will follow a frequentist paradigm, and in particular u† will be considered as a deterministic quantity. Our data misfit functional $\mathcal S$ will be convex in its second argument, so the minimization problem (2) will be convex if F is linear.

Recently, considerable progress has been achieved in the deterministic analysis of variational regularization methods in Banach spaces [6, 7, 10, 1416, 27]. In particular, a number of papers have been devoted to the Kullback–Leibler divergence as the data fidelity term in (2), motivated by the case of Poisson data (see [2, 3, 10, 11, 24, 25]), but all of them under deterministic error assumptions. On the statistical side, inverse problems for Poisson data have been studied by Antoniadis and Bigot [1] by wavelet Galerkin methods. Their study is restricted to linear operators with favorable mapping properties in certain function spaces. Therefore, there is a need for a statistical convergence analysis for inverse problems with Poisson data involving more general and in particular nonlinear forward operators. This is the aim of this paper.

Our convergence analysis of the estimator (2) is based on two basic ingredients. The first is a uniform concentration inequality for Poisson data (theorem 2.1), which will be formulated together with some basic properties of Poisson processes in section 2. The proof of theorem 2.1, which is based on results by Reynaud–Bouret [26], is given in an appendix. The second ingredient, presented in section 3, is a deterministic error analysis of (2) for general $\mathcal S$ under a variational source condition (theorem 3.3). Our main results are two estimates of the expected reconstruction error as the exposure time t tends to : for an a priori choice of α, which requires knowledge of the smoothness of u†, such a result is shown in theorem 4.3. Finally, a convergence rate result for a completely adaptive method, where α is chosen by a Lepskii-type balancing principle, is presented in theorem 5.1.

2. Results on Poisson processes

Let $\mathbb {M}\subset \mathbb {R}^d$ be a submanifold where measurements are taken, and let $\lbrace x_1,\dots ,x_N\rbrace \subset \mathbb {M}$ denote the positions of the detected photons. Both the total number N of observed photons and the positions $x_i\in \mathbb {M}$ of the photons are random. It will turn out convenient (see e.g. equation (3)) to describe the set of points by a sum of Dirac point measures G := ∑Ni = 1δxi. It is physically evident that this measure fulfils the following two properties.

  • (i)  
    For all measurable subsets $\mathbb {M}^{\prime }\subset \mathbb {M}$, the integer-valued random variable $G(\mathbb {M}^{\prime })= \#\lbrace i | x_i\in \mathbb {M}^{\prime }\rbrace$ has expectation $\mathbf {E}[G(\mathbb {M}^{\prime })]=\int _{\mathbb {M}^{\prime }}g^\dagger \,\mathrm{d}x$, where $g^\dagger \in L^1(\mathbb {M})$ denotes the underlying photon density.
  • (ii)  
    For any choice of m disjoint measurable subsets $\mathbb {M}_1^{\prime },\dots ,\mathbb {M}_m^{\prime }\subset \mathbb {M}$ the random variables $G (\mathbb {M}^{\prime }_1),\dots , G(\mathbb {M}^{\prime }_m)$ are stochastically independent.

By definition, a point process (i.e. a random set of points) is called a Poisson process with intensity g† if it satisfies the properties (i) and (ii). It follows from these properties that $G(\mathbb {M}^{\prime })$ for any measurable $\mathbb {M}^{\prime }\subset \mathbb {M}$ is Poisson distributed with mean $\lambda := \mathbf {E}[G(\mathbb {M}^{\prime })]$, i.e. $\mathbf {P}[G(\mathbb {M}^{\prime })=n] = \exp (-\lambda )\frac{\lambda ^n}{n!}$ for all n ∈ {0, 1, ...} (see e.g. [19, theorem 1.11.8]). Moreover, for any measurable $\psi :\Omega \rightarrow \mathbb {R}$ we have

Equation (3)

whenever the right-hand sides are well defined (see [20, equations (3.9) and (3.10)]).

Let us introduce for each exposure time t > 0 a Poisson process $\tilde{G}_t$ with intensity tg† and define $G_t:= \frac{1}{t} \tilde{G}_t$. We will study error estimates for approximate solutions to the inverse problem (1) with data Gt in the limit t. For this end, it will be necessary to derive estimates on the distribution of the negative log-likelihood functional

Equation (4)

which is defined for functions g fulfilling g ⩾ 0 a.e. We set ln 0 := −, so $\mathcal S\left(G_t;g\right)=\infty$ if g(xi) = 0 for some i = 1, ..., N. Using (3), we obtain

Equation (5)

if the integrals exist. Moreover, we have

and the right-hand side (if well defined) is known as Kullback–Leibler divergence

Equation (6)

$\mathbb {KL}(g^\dagger ;g)$ can be seen as the ideal data misfit functional if the exact data g† were known. Since only Gt is given, we approximate $\mathbb {KL}(g^\dagger ;g)$ by $\mathcal S(G_t;g)$ up to the additive constant $\mathbf {E}[\mathcal S(G_t;g^\dagger )]$, which is independent of g. The error between the estimated and the ideal data misfit functionals is given by

Equation (7)

Based on results by Reynaud–Bouret [26], which can be seen as an analogue to Talagrand's concentration inequalities for empirical processes, we will derive the following concentration inequality for such error terms in the appendix.

Theorem 2.1. Let $\mathbb {M}\subset \mathbb R^d$ be a bounded domain with Lipschitz boundary ∂D, R ⩾ 1 and $s > \frac{d}{2}$. Consider the ball

Then there exists a constant Cconc ⩾ 1 depending only on $\mathbb {M}$, s and $\Vert g^\dagger \Vert _{\mathbf {L}^1(\mathbb {M})}$ such that

for all t ⩾ 1 and ρ ⩾ RCconc.

To apply this concentration inequality to the right-hand side of (7), we would need ln (F(u)) ∈ Bs(R) for all $u\in \mathfrak {B}$. However, since $\sup _{\mathfrak {g}\in B_s(R)}\Vert \mathfrak {g}\Vert _{\infty }<\infty$ by Sobolev's embedding theorem, zeros of F(u) for some $u\in \mathfrak {B}$ would not be admissible, which is a quite restrictive assumption. Therefore, we use a shifted version of the data fidelity term with an offset parameter σ > 0:

Equation (8)

Equation (9)

Then, the error is given by

Equation (10)

We will show in section 4 that theorem 2.1 can be used to estimate the concentration of $\sup _{u\in \mathfrak {B}} Z(F(u))$ under certain assumptions.

3. A deterministic convergence rate result

In this section, we will perform a convergence analysis for method (2) with general $\mathcal S$ under a deterministic noise assumption. Similar results have been obtained by Flemming [10, 11], Grasmair [14], and Bot and Hofmann [6] under different assumptions on $\mathcal {S}$.

As in section 2, we will consider the 'distance' between the estimated and the ideal data misfit functionals as the noise level.

Assumption 1. Let $u^\dagger \in \mathfrak B \subset \mathcal {X}$ be the exact solution and denote by $g^\dagger := F(u^\dagger ) \in \mathcal {Y}$ the exact data. Let $\mathcal {Y}^{\rm obs}$ be a set containing all possible observations and $g^{\rm obs}\in \mathcal {Y}^{\rm obs}$ the observed data. Assume that

  • (i)  
    The exact data fidelity functional $\mathcal {T}: F(\mathfrak {B}) \times F(\mathfrak {B}) \rightarrow [0,\infty ]$ is non-negative, and $\mathcal {T}(g^\dagger ,g^\dagger )=0$.
  • (ii)  
    For the empirical data fidelity term $\mathcal {S}: \mathcal {Y}^{\rm obs}\times F(\mathfrak {B}) \rightarrow (-\infty ,\infty ]$ with the given data $g^{\rm obs}\in \mathcal {Y}^{\rm obs}$ there exist constants $\operatorname{\mathbf {err}}\ge 0$ and Cerr ⩾ 1 such that
    Equation (11)
    for all $g \in F(\mathfrak B)$.

Example 3.1. 

  • Classical deterministic noise model. If $\mathcal S(g;\hat{g}) = \mathcal {T}(g;\hat{g}) = \Vert g-\hat{g}\Vert _{\mathcal {Y}}^r$, then we obtain from |ab|r ⩾ 21 − rarbr that (11) holds true with Cerr = 2r − 1 and $\operatorname{\mathbf {err}}= 2\Vert g^\dagger - g^{\rm obs}\Vert _{\mathcal {Y}}^r$. Thus, assumption 1 covers the classical deterministic noise model.
  • Poisson data. For the case of $\mathcal S$ and $\mathcal T$ as in (8) and (9) it can be seen from elementary calculations that (11) requires Cerr = 1 and
    Equation (12)
    for all $u \in \mathfrak B$. Consequently (11) holds true with Cerr = 1 if $\operatorname{\mathbf {err}}/2$ is an upper bound for the integrals in (10) with $g = F(u), u \in \mathfrak B$. We will show that theorem 2.1 ensures that this holds true for $\operatorname{\mathbf {err}}/2=\frac{\rho }{\sqrt{t}}$ with probability ⩾1 − exp ( − cρ) for some constant c > 0 (cf corollary 4.2).

In a previous study of Newton-type methods for inverse problems with Poisson data [18], the authors had to use a slightly stronger assumption on the noise level involving a second inequality. Let us denote the error functional from [18, assumption 2] by $f_{\operatorname{\mathbf {err}}} (g)$, $g \in \mathcal {Y}$. Then [18, assumption 2] implies (11) with $\operatorname{\mathbf {err}}= (1+C_{\rm err}) \sup _{u \in \mathfrak B} f_{\operatorname{\mathbf {err}}}(F(u))$ provided this value is finite. On the other hand, (11) allows that $\mathcal S(g^{\rm obs};g) = \infty$ even if $\mathcal {T}(g^\dagger ;g) < \infty$, which is impossible in [18, assumption 2] if $f_{\operatorname{\mathbf {err}}}(g) < \infty$.

To measure the smoothness of the unknown solution, we will use a source condition in the form of a variational inequality, which was introduced by Hofmann et al [16] for the case of a Hölder-type source condition with index $\nu = \frac{1}{2}$ and generalized in many recent publications [6, 10, 13, 14, 17]. For their formulation, we need the Bregman distance. For a subgradient $u^* \in \partial \mathcal {R}(u^\dagger )\subset \mathcal {X}^*$ (e.g. u* = u† − u0 if $\mathcal {R}(u) = \frac{1}{2}\Vert u-u_0\Vert _{\mathcal {X}}^2$ with a Hilbert norm $\Vert \cdot \Vert _{\mathcal {X}}$) the Bregman distance of $\mathcal {R}$ between u and u† w.r.t. u* is given by

In the aforementioned example of $\mathcal {R}(u) = \frac{1}{2}\Vert u-u_0\Vert _{\mathcal {X}}^2$ for a Hilbert space norm $\Vert \cdot \Vert _{\mathcal {X}}$ we have $\mathcal D_{\mathcal {R}}^{u^*}(u,u^\dagger ) = \frac{1}{2}\Vert u-u^\dagger \Vert _{\mathcal {X}}^2$. In this sense, the Bregman distance is a natural generalization of the norm. We will use the Bregman distance also to measure the error of our approximate solutions.

Now we are able to formulate our assumption on the smoothness of u†.

Assumption 2 (Variational source condition). $\mathcal {R}:\mathcal {X}\rightarrow (-\infty ,\infty ]$ is a proper convex functional and there exist $u^*\in \partial \mathcal {R}(u^\dagger )$, a parameter β > 0 and an index function φ (i.e. φ monotonically increasing, φ(0) = 0) such that φ is concave and

Equation (13)

Example 3.2. Let ψ be an index function, ψ2 concave and $F:D\left(F\right) \subset \mathcal {X}\rightarrow \mathcal {Y}$ Fréchet differentiable with an open domain D(F) between Hilbert spaces $\mathcal {X}$ and $\mathcal {Y}$ with the Fréchet derivative F'[ · ]. Flemming [11, 13] has shown that

Equation (14)

together with the tangential cone condition $\Vert F^{\prime } [u^\dagger ] (u-u^\dagger )\Vert _{\mathcal {Y}} \le \eta \Vert F(u) - F(v)\Vert _{\mathcal {Y}}$ implies the variational inequality

Equation (15)

for all uD(F). Here φψ is another index function depending on ψ, and for the most important cases of Hölder-type and logarithmic source conditions the implications

Equation (16a)

Equation (16b)

hold true with some constants $\tilde{\beta }, \bar{\beta }$ where p > 0 and $\nu \in \big(0,\frac{1}{2}\big]$ (see Hofmann and Yamamoto [17, proposition 6.6] and Flemming [11, section 13.5.2] respectively).

In the case of $\psi (\tau ) = \sqrt{\tau }$ and a closed and convex set $\mathfrak B\subset \mathcal {X}$, the variational inequality (15) for $u \in \mathfrak B$ with $\varphi _\psi (\tau ) = \tilde{\beta }\sqrt{\tau }$, $\tilde{\beta }> 0$ should correctly be compared with the projected source condition $u^\dagger =P_{\mathfrak B}(u_0 + F^{\prime }[u^\dagger ]^* \omega )$, where $P_{\mathfrak B}$ denotes the metric projector onto $\mathfrak B$. Flemming and Hofmann [12] have shown that both conditions are almost equivalent. Thus, assumption 2 is in general weaker than a spectral source condition, and for special choices of φ one obtains an almost equivalent formulation of the (projected) spectral source condition.

With the notation (10) of the error, we are able to perform a deterministic convergence analysis including an error decomposition. Following Grasmair [14], we use the Fenchel conjugate of −φ to bound the approximation error. Recall that the Fenchel conjugate f* of a function $f: \mathbb R \rightarrow (-\infty ,\infty ]$ is given by

f* is always convex as supremum over the affine-linear (and hence convex) functions ssτ − f(τ). Setting φ(τ) := − for τ < 0, we obtain

Equation (17)

This allows us to apply tools from convex analysis. For convex and continuous f Young's inequality holds true (see e.g. [9, equation (4.1) and proposition 5.1]), which states

Equation (18)

Moreover, for convex and continuous f we have f** = f (see e.g. [9, proposition 4.1]). Note that the concavity of φ implies that −φ is convex and due to finiteness thus also continuous (see [9, corollary 2.3]).

Now we are in a position to prove our deterministic convergence rates result.

Theorem 3.3. Suppose assumptions 1 and 2 hold true and the Tikhonov functional has a global minimizer. Then, we have the following assertions.

  • (i)  
    If $\operatorname{\mathbf {err}}\ge 0$ satisfies assumption 1, then
    Equation (19)
  • (ii)  
    Let $\operatorname{\mathbf {err}}>0$. Then the infimum of the right-hand side of (19) is attained at $\alpha =\overline{\alpha }$ if and only if $\frac{-1}{C_{\rm err}\overline{\alpha }} \in \partial (-\varphi )(C_{\rm err}\operatorname{\mathbf {err}})$, and we have
    Equation (20)

Remark 3.4. As we have seen above, −φ is convex and continuous, and thus we have ∂( − φ)(s) ≠ ∅ for all s > 0 (see e.g. [9, proposition 5.2]), i.e. the parameter choice $\frac{-1}{C_{\rm err}\overline{\alpha}}\in \partial(-\varphi)(C_{\rm err}\operatorname{\mathbf {err}})$ is feasible. If φ is differentiable, then ∂( − φ)(s) = { − φ'(s)} and the given choice is equivalent to $\overline{\alpha} = 1/(C_{\rm err}\varphi ^{\prime }(C_{\rm err}\operatorname{\mathbf {err}}))$.

Note that the same rates of convergence can also be obtained by the a priori parameter choices in [6, 14], which are slightly more explicit.

If φ(τ) = cτ, then the effect of exact penalization (see e.g. [7]) occurs: the parameter choice no longer depends on $\operatorname{\mathbf {err}}$, i.e. convergence for fixed $\bar{\alpha }= \frac{1}{cC_{\rm err}}$ is obtained as $\operatorname{\mathbf {err}}\searrow 0$.

Proof of theorem 3.3. (i). By the definition of uα, we have

Equation (21)

It follows that

(ii). Using the fact that ( − φ)** = −φ, we obtain

where we used the concavity of φ. By the conditions for equality in Young's inequality (18), the supremum is attained at $\alpha =\overline{\alpha }$ if and only if $\frac{-1}{C_{\rm err}\overline{\alpha }} \in \partial (-\varphi )(C_{\rm err}\operatorname{\mathbf {err}})$. □

Example 3.5 (Classical case). Let $F = T : \mathcal {X}\rightarrow \mathcal {Y}$ be a bounded linear operator between Hilbert spaces $\mathcal {X}$ and $\mathcal {Y}$. For $\mathcal S(g_2;g_1) = \mathcal {T}(g_2;g_1) = \Vert g_1 - g_2\Vert _{\mathcal {Y}}^2$ and $\mathcal {R}(u) = \Vert u-u_0\Vert _{\mathcal {X}}^2$ we have $\mathcal D_{\mathcal {R}}^{u^*}(u,u^\dagger ) =\Vert u-u^\dagger \Vert _{\mathcal {X}}^2$, Cerr = 2 and $\operatorname{\mathbf {err}}= 2\Vert g^\dagger -g^{\rm obs}\Vert _{\mathcal {Y}}^2$. As we have seen in example 3.2, (14) implies moreover (13) with φ = φψ.

If $\Vert g^\dagger -g^{\rm obs}\Vert _{\mathcal {Y}} \le \delta$ as mentioned in the introduction, then we obtain for an appropriate parameter choice $\Vert u_{\alpha }-u^\dagger \Vert _{\mathcal {X}} = \mathcal O( \sqrt{ \varphi _\psi (\delta ^2) })$. For the special examples of ψ given in (16), we obtain from example 3.2 that

respectively, and these convergence rates are known to be of optimal order.

4. Convergence rates for Poisson data with a priori parameter choice rule

In this section, we will combine the theorems 2.1 and 3.3 to obtain convergence rates for the method (2) with Poisson data. We need the following properties of the operator F.

Assumption 3 (Assumption on the forward operator). Let $\mathcal {X}$ be a Banach space and $\mathfrak B\subset \mathcal {X}$ a bounded, closed and convex subset containing the exact solution u† to (1). Let $\mathbb {M}\subset \mathbb R^d$ a bounded domain with Lipschitz boundary ∂D. Assume moreover that the operator $F : \mathfrak B\rightarrow \mathcal {Y}:=\mathbf {L}^1 (\mathbb {M})$ has the following properties.

  • (i)  
    F(u) ⩾ 0 a.e. for all $u \in \mathfrak B$.
  • (ii)  
    There exists a Sobolev index $s > \frac{d}{2}$, such that $F(\mathfrak {B})$ is a bounded subset of $H^s (\mathbb {M})$.

Property (i) is natural since photon densities (or more generally intensities of Poisson processes) have to be non-negative. Property (ii) is not restrictive for inverse problems since it corresponds to a smoothing property of F which is usually responsible for the ill-posedness of the underlying problem.

Remark 4.1 (Discussion of assumption 2). Let assumption 3 and (14) hold true. Since we have the lower bound

Equation (22)

with $\mathcal T$ as in (9) at hand (see [5]), we find that (15) implies (13) with $\mathcal T$ as in (9) and an index function differing from φψ as in example 3.2 only by a multiplicative constant.

In particular, if F(u†) = 0 on some parts of $\mathbb {M}$ it may happen that (13) holds true with an index function better than φψ.

Assumption 3 moreover allows us to prove the following corollary, which shows that theorem 2.1 applies for the integrals in (10).

Corollary 4.2. Let assumption 3 hold true, set

Equation (23)

and consider Z defined in (10) with σ > 0. Then, there exists Cconc ⩾ 1 depending only on $\mathbb {M}$ and s such that

Equation (24)

for all t ⩾ 1, ρ ⩾ Rmax {σ−⌊s⌋ − 1, |ln (R)|}Cconc.

Proof. W.l.o.g. we may assume that R ⩾ 1. Due to Sobolev's embedding theorem and s > d/2 the embedding operator $E_\infty :H^s (\mathbb {M}) \hookrightarrow \mathbf {L}^\infty (\mathbb {M})$ is well defined and continuous, so we have $\Vert F(u) \Vert _{\mathbf {L}^\infty (\mathbb {M})} \le R \Vert E_{\infty } \Vert$ for all $u \in \mathfrak B$.

By an extension argument it can be seen from [23] that for $\mathbb {M}\subset \mathbb R^d$ with Lipschitz boundary, $g \in H^s(\mathbb {M})\cap \mathbf {L}^\infty (\mathbb {M})$ and $\Phi \in C^{\lfloor s \rfloor +1} (\mathbb R)$ one has $\Phi \circ g \in H^s (\mathbb {M})$ and

Equation (25)

with C > 0 independent of Φ and g. To apply this result, we first extend the function x↦ln (x + σ) from [0, R||E||] (since we have 0 ⩽ F(u) ⩽ R||E|| a.e.) to a function Φ on the whole real line such that $\Phi \in C^{\lfloor s \rfloor +1} (\mathbb R)$. Then for any fixed $u \in \mathfrak B$ we obtain $\Phi \circ F(u+\sigma ) \in H^s (\mathbb {M})$ and since $\Phi _{|_{[0, R\Vert E_{\infty } \Vert ]}} (\cdot ) = \ln (\cdot + \sigma )$ and 0 ⩽ F(u) ⩽ R||E|| a.e. we have Φ○(F(u) + σ) = ln (F(u) + σ) a.e. Since all derivatives up to order ⌊s⌋ + 1 of x↦ln (x + σ) and hence of Φ on [0, R||E||] can be bounded by some constant of order max {σ−⌊s⌋ − 1, ln (R||E||)}, the extension and composition procedure described above is bounded, i.e. by (25) there exists a constant $\tilde{C}>0$ independent of u, R and σ such that

for all $u \in \mathfrak B$. Now the assertion follows from theorem 2.1. □

Now we are able to present our first main result for Poisson data.

Theorem 4.3. Let the assumptions 2 with $\mathcal T$ defined in (9) and assumption 3 be satisfied. Moreover, suppose that (2) with $\mathcal {S}$ in (8) has a global minimizer. If we choose the regularization parameter α = α(t) such that

Equation (26)

then we obtain the convergence rate

Proof. After possibly replacing $\Vert \cdot \Vert _{H^1(\mathbb {M})}$ by an equivalent norm, we may assume w.l.o.g. that R in (23) satisfies R ⩾ exp (1). Now note that assumption 1 holds true with Cerr = 1 whenever the bound $\operatorname{\mathbf {err}}$ fulfils (12). By corollary 4.2 the right-hand side of (12) is bounded by $2\frac{\rho }{\sqrt{t}}$ with probability greater or equal to 1 − exp ( − cρ) with ρ ⩾ 1/c,

and Cconc as in corollary 4.2. Now let $\rho _k:= c^{-1} k, k\in \mathbb N$ and consider the events

with Z as defined in (10). Corollary 4.2 implies

and on Ek assumption 1 holds true with Cerr = 1 and $\operatorname{\mathbf {err}}= 2 \sup _{u \in \mathfrak B} Z(F(u)) \le 2\rho _k / \sqrt{t}$. Thus, theorem 3.3(i) implies

for all $k \in \mathbb N$ and α > 0, where we used 2ρkc−1 ⩾ 1 due to R ⩾ exp (1). According to theorem 3.3(ii) the infimum of the right-hand side is attained at α defined in (26), and

for all $k \in \mathbb N$. Now, we obtain

The sum converges and the proof is complete. □

5. A Lepskii-type parameter choice rule

Usually the parameter choice rule (26) is not implementable since it requires a priori knowledge of the function φ characterizing the smoothness of the unknown solution u†. To adapt to unknown smoothness of the solution, a posteriori parameter choice rules have to be used. In a deterministic context, the most widely used such rule is discrepancy principle. However, in our context it is not applicable in an obvious way since $\mathcal S$ approximates $\mathcal T$ only up to the unknown constant $\mathbf {E}[\mathcal S(G_t;g^\dagger )]$.

In the following, we will describe and analyze the Lepskii principle as described and analyzed in the context of inverse problems by Mathé and Pereverzev [22, 21]. Lepskii's balancing principle requires a metric on $\mathcal {X}$, and hence we assume in the following that there exists a constant Cbd > 0 and a number q ⩾ 1 such that

Equation (27)

This is fulfilled trivially with q = 2 and Cbd = 1, if $\mathcal {X}$ is a Hilbert space and $\mathcal {R}(u) = \Vert u-u_0\Vert _{\mathcal {X}}^2$ (then we have equality in (27)). Moreover, for a q-convex Banach space $\mathcal {X}$ and $\mathcal {R}(u) = \Vert u\Vert _{\mathcal {X}}^q$ the estimate (27) is valid (see [32]). Besides this special case of norm powers, (27) can be fulfilled for other choices of $\mathcal {R}$. For example, for maximum entropy regularization, i.e. $\mathcal {R}(u) = \int u \ln (u) \,\mathrm d x$, the Bregman distance coincides with the Kullback–Leibler divergence, and we have seen in remark 4.1 that (27) holds true with $\mathcal {X}=L^2$ in this situation.

The deterministic convergence analysis from section 3 already provides an error decomposition. Assuming β ⩾ 1/2, theorem 3.3(i) together with (27) states that

Equation (28)

with the approximation error fapp(α) and the propagated data noise error fnoi(α) defined by

Equation (29)

Here, the constant 2 in front of Cbd is an estimate of 1/β.

If we provide a finite number of regularization parameters α1, α2, ..., αm, then the aim of Lepskii's balancing principle is to select a regularization parameter $\alpha _{j_{\rm bal}}$ which yields order optimal convergence rates. For this to be possible we need to ensure that the approximation error fapp behaves sufficiently well, i.e. that min j = 1, ..., mfappj) is bounded from above up to some constant by $\inf _{\alpha >0}f_{\rm app}(\alpha )$. In particular, for φ(τ) = cτ we find that fapp is given by an indicator function, and this case should be excluded. Therefore, we assume additionally that φ1 + ε is concave for some ε > 0. This assumption improves the corresponding result from [30] where ε = 1 is used.

For the error decomposition (28), it is important to note that fapp is typically unknown, whereas fnoi is known if the upper bound $\operatorname{\mathbf {err}}$ is available. But due to corollary 4.2 the error is bounded by $\rho / \sqrt{t}$ with probability 1 − exp ( − cρ). This observation is fundamental in the proof of the following theorem.

Theorem 5.1. Let assumptions 2 and 3 with $\beta \in [\frac{1}{2}, \infty )$, φ such that φ1 + ε is concave for some ε > 0 and $\mathcal S$, $\mathcal T$ as in (8) and (9) be fulfilled and suppose (27) holds true. Suppose that (2) has a global minimizer and let σ > 0, r > 1, $R := \sup _{u \in \mathfrak B} \Vert F(u)\Vert _{H^s (\mathbb {M})} < \infty$ and $\tau \ge \frac{1}{2} R \max \lbrace \sigma ^{-{\lfloor s \rfloor -1}},|\ln (R)|\rbrace C_{\rm conc}$. Define the sequence

Then with $m := \min \lbrace j \in \mathbb N \big | \alpha _j \ge 1\rbrace$ the choice

Equation (30)

yields

Proof. If t ⩾ exp (4), the assumptions of corollary 4.2 are fulfilled with ρ(t) := τln (t). Then with Z as in (10) the event

has probability $\mathbf {P}[A_\rho ^c] \le \exp (-c\rho (t))$ with c = (Rmax {σ−⌊s⌋ − 1, |ln (R)|}Cconc)−1 by corollary 4.2. Moreover, as we have seen in (28), on Aρ the error decomposition

holds true with

and ϕ = fapp as in (29). Note that 2ψ(i) corresponds to the required bound for $\Vert u_{\alpha _i} - u_{\alpha _j}\Vert _{\mathcal {X}}$ in (30). The function ψ is obviously non-increasing and fulfils ψ(j) ⩽ r2/q(j + 1) and it can be seen by elementary computations that ϕ is monotonically increasing. Now [21, corollary 1] implies the so-called oracle inequality

By inserting the definitions of ϕ and ψ, we find

Equation (31)

The concavity of φ1 + ε implies that $\varphi (C\tau ) \le C^{\frac{1}{1+\varepsilon }}\varphi (\tau )$ for all C ⩾ 1 and τ ⩾ 0, and hence we find by substituting τ/C = C1/εη in (17) that

whenever C ⩾ 1. Moreover, $\alpha \mapsto (-\varphi )^* (-\frac{1}{\alpha })$ is monotonically increasing, i.e. we have $(-\varphi )^* (-\frac{1}{C\alpha }) \le \max \lbrace 1,C^{\frac{1}{\varepsilon }}\rbrace (-\varphi )^* (-\frac{1}{\alpha })$ for all C, α > 0. Now it is straightforward to show that the minimum over α1, ..., αm can be replaced up to some constant depending only on r by the infimum over α ⩾ α1 if t is sufficiently large (see [30, lemma 3.42] for ε = 1). By theorem 3.3(ii) the sum $(-\varphi )^* (-1/\alpha ) + \rho (t)/(\sqrt{t}\alpha )$ attains its minimum over α ∈ (0, ) at α = αopt if and only if $1/\alpha _{\rm opt} \in -\partial (-\varphi ) (\rho (t)/\sqrt{t})$. Note that $\rho (t)/\sqrt{t} = \alpha _1$. By elementary arguments from convex analysis, we find using the concavity of φ that

for all 0 ⩽ s ⩽ α1. Thus choosing s = 0 shows that $\alpha _1 / \alpha _{\rm opt} \le \varphi (\alpha _1 ) = \varphi (\rho (t)/\sqrt{t})$, for all t > 0. As the right-hand side decays to 0 as t, we have α1 ⩽ αopt for t sufficiently large. Therefore, the minimum in (31) can indeed be replaced (up to some constant) by the infimum over all α > 0. Defining $\text{diam}(\mathfrak B) := \sup _{u,v \in \mathfrak B} \Vert u - v\Vert _{\mathcal {X}}$ which is finite by assumption 3 we find from (31) and theorem 3.3 that

with some constant C > 0. Due to the definition of ρ, 2τc ⩾ 1, ln (t) ⩾ 1 and $\ln (t)/\sqrt{t} <1$ we obtain

using the concavity of φ. This proves the assertion. □

Note that the constants R and Cconc—which are necessary to ensure a proper choice of the sequence αj and hence for the implementation of this Lepskii-type balancing principle—can be calculated in principle (assuming e.g. the scaling condition $\Vert g^\dagger \Vert _{\mathbf {L}^1(\mathbb {M})}=1$). Thus theorem 5.1 yields convergence rates in expectation for a completely adaptive algorithm.

Comparing the rates in theorems 4.3 and 5.1 note that we have to pay a logarithmic factor for adaptation to unknown smoothness by the Lepskii principle. It is known (see [29]) that in some cases the loss of such a logarithmic factor is inevitable.

Acknowledgments

We would like to thank Patricia Reynaud–Bouret for fruitful discussions on the concentration inequality. Financial support by the German Research Foundation DFG through the Research Training Group 1023 and CRC 755 is gratefully acknowledged.

Appendix.: Proof of theorem 2.1

Appendix. 

In this section, we will prove the uniform concentration inequality stated in theorem 2.1. Our result is based on the work of Reynaud–Bouret [26] who proved the following concentration inequality.

Lemma A.1 ([26, corollary 2]). Let N be a Poisson process with finite mean measure ν. Let {fa}aA be a countable family of functions with values in [ − b, b] and define

Then for all positive numbers ρ and ε it holds

where κ(ε) = 5/4 + 32/ε.

We will use a denseness argument to apply lemma A.1 to $t\tilde{Z}$ with

The properties derived in the following lemma will be sufficient to bind $\mathbf {E}[\tilde{Z}]$.

Lemma A.2. Let $\mathbb {M}\subset \mathbb R^d$ be a bounded domain with the Lipschitz boundary, R > 0 and suppose $s>\frac{d}{2}$. Then there exists a countable family of real-valued functions $\lbrace \phi _\mathbf {j}: \mathbf {j}\in \mathcal {J}\rbrace$, numbers $\gamma _\mathbf {j}$, $\mathbf {j}\in \mathcal {J}$ and constants c1, c2 > 0 depending only on s and $\mathbb {M}$ such that

Equation (A.1)

and for all $\mathfrak g\in B_s(R)$ there exists real numbers $\beta _\mathbf {j}, \mathbf {j}\in \mathcal {J}$ such that

Equation (A.2)

Proof. Choose some κ > 0 such that $\overline{\mathbb {M}} \subset (-\kappa ,\kappa )^d$. Then, there exists a continuous extension operator $E: H^s (\mathbb {M}) \longrightarrow H^s_0([-\kappa , \kappa ]^d)$ (see e.g. [31, corollary 5.1]). Consider the following orthonormal bases $\lbrace \varphi _j:j\in \mathbb {Z}\rbrace$ of $\mathbf {L}^2([-\kappa ,\kappa ])$ and $\lbrace \phi _\mathbf {j}:\mathbf {j}\in \mathbb {Z}^d\rbrace$ of $\mathbf {L}^2([-\kappa ,\kappa ]^d)$:

We introduce the norm $\Vert \mathfrak {g}\Vert _{H^s_{\rm per}} = \big (\sum _{j\in \mathbb {Z}^d}(1+|\mathbf {j}|^2)^s |\langle \mathfrak {g},\phi _{\mathbf {j}}\rangle |^2\big )^{1/2}$ and the periodic Sobolev space $H^s_{\rm per} \big ([-\kappa ,\kappa ]^d\big ):=\lbrace \mathfrak {g}\in \mathbf {L}^2([-\kappa ,\kappa ]^d) \big | \Vert \mathfrak {g}\Vert _{H^s_{\rm per}}<\infty \rbrace$. The embedding J: Hs0([ − κ, κ]d)↪Hsper([ − κ, κ]d) is well defined and continuous as the norms of both spaces are equivalent (see e.g. [31, exercise 1.13]), so the extension operator

is continuous. In particular,

and (A.2) holds true with $\beta _\mathbf {j}:= \langle \mathfrak g,\phi _\mathbf {j}\rangle$ and $\gamma _\mathbf {j}:= (1 + |\mathbf {j}|^2 )^{-s/2}$. Moreover, as ||ϕ2j|| ⩽ κd for all $\mathbf {j}\in \mathbb {Z}^d$ we obtain

and majorization of the sum by an integral shows that c1 < as s > d/2. Therefore, (A.1) holds true, and the proof is complete. □

Lemma A.3. Under the assumptions of lemma A.2, we have

Proof. With the help of lemma A.2 we can now insert (A.2) and apply Hölder's inequality for sums to find

where we used that the functions $\phi _\mathbf {j}$ are real-valued. Hence, by Jensen's inequality

Equation (A.3)

Using (3) we obtain

and plugging this into (A.3) and using (A.1) we obtain

 □

Proof of theorem 2.1. By Sobolev's embedding theorem the embedding operator $E_\infty \!\!:H^s (\mathbb {M}) \hookrightarrow \mathbf {L}^\infty (\mathbb {M})$ is well defined and continuous, so

Equation (A.4)

Now we choose a countable subset $\lbrace \mathfrak g_a\rbrace _{a \in A} \subset B_s(R)$ which is dense in Bs(R) w.r.t. the Hs-norm and hence also the $\mathbf {L}^\infty$-norm, and set N = tGt and $\,\mathrm d \nu = t g^\dagger \,\mathrm d x$ in lemma A.1 to obtain

Equation (A.5)

for all $\bar{\rho }> 0$. Choosing ε = 1 and using lemma A.3 and the simple estimate

yields

Equation (A.6)

for all $\bar{\rho }, t > 0$ with $C_1:= 2\sqrt{c_1} c_2\sqrt{\Vert g^\dagger \Vert _{\mathbf {L}^1(\mathbb {M})}}$, $C_2 := \sqrt{12}\Vert E_{\infty } \Vert \sqrt{\Vert g^\dagger \Vert _{\mathbf {L}^1 (\mathbb {M})}}$ and $C_3 :=(32 + \frac{5}{4})\Vert E_{\infty } \Vert$. If $t, \bar{\rho }\ge 1$, we have $\frac{1}{t} \le \frac{1}{\sqrt{t}}$ and $\sqrt{\rho }\le \rho$, so

Setting Cconc := max {C1 + C2 + C3, 1} and $\rho := \bar{\rho }RC_{\rm conc}$ this shows the assertion. □

Please wait… references are loading.
10.1088/0266-5611/28/10/104004