Preconditioned alternating projection algorithms for maximum a posteriori ECT reconstruction

Andrzej Krol; Si Li; Lixin Shen; Yuesheng Xu

doi:10.1088/0266-5611/28/11/115005

1. Introduction

Emission computed tomography (ECT) is a noninvasive molecular imaging method that requires administration of radioactive tracers to patients. It comprises two branches: positron-emission tomography (PET) and single-photon ECT (SPECT) [36, 46]. It might provide estimates of the spatial and/or temporal distribution of radioactive tracer inside a patient's body through tomographic reconstruction from the detected emission events (typically in the form of projection images for SPECT and sinograms or list data for PET). ECT provides a three-dimensional (3D) functional rather than structural information provided by CT or magnetic resonance imaging (MRI). In a SPECT imaging system, the detectors (detector units) record the number of single events due to gamma or x-ray photons emitted by SPECT radioactive tracer distributed inside a patient's body. The photons are detected only if they travel along directions well defined by a collimator. The collection of such emission data can be sorted into a set of 2D projection images [46]. In PET systems, the pairs of detectors record the number of coincidence events due to two 511 keV gamma photons emitted in positron annihilation events. Positrons are emitted by PET radioactive tracer distributed inside a patient's body. They travel certain randomly distributed distances in the tissue before they annihilate electrons and produce pairs of 511 keV photons emitted in random and approximately opposite directions. The collections of such emission data form list data and can be sorted into a set of sinograms [36]. Clinical applications of ECT include detection, staging and monitoring response to cancer therapy, detection and risk stratification of cardiovascular diseases, mapping of regional blood flow in the brain, bone scans, pulmonary ventilation/perfusion scans and renal scans [48].

The aim of the reconstruction process is to obtain accurate estimation of the radiotracer distribution in a patient or a phantom from the detected emission photons. The most commonly used probability distribution for a description of raw ECT data is the Poisson model [25, 42, 46]. It states that the vector of the number of events recorded by the detector units during ECT scan is a Poisson distributed random vector with a mean equal to the sum of the system matrix multiplied by the mean radiotracer activity distribution vector within an object of interest and by the mean 'background' counts vector. Sources of background counts include cosmic rays and terrestrial radioactive background. They are assumed to follow the Poisson distribution. For a given realization of the detected ECT data and for the known expected background counts, this model allows one to estimate the mean radiotracer activity distribution.

Finding numerical solutions of the model is a long-standing research problem. The iterative expectation-maximization (EM) algorithm is a frequently used reconstruction method for ECT imaging [25, 44]. Many reconstruction algorithms for the model are based on optimizing an objective function deduced in part from the statistical model of detection realization data. For example, the maximum-likelihood (ML) method is based on minimizing the negative log-likelihood of observed emission data conditional on radiotracer distribution. Under the Bayesian framework, the maximum a posteriori (MAP) estimator seeks to minimize the sum of the negative log-likelihood of observed emission data conditional on radiotracer distribution and a regularizing penalty function, which penalizes solutions that have low probability. With various considerations such as the computational efficiency of the algorithm to be developed and the spatial resolution of the reconstructed images, many different types of penalty functions were proposed. The total-variation (TV)-based penalty function introduced in [20, 35] is particularly interesting in the field of ECT image reconstruction because it preserves the high spatial frequency components of the reconstructed radiopharmaceutical distribution including discontinuities and steep gradients. Many efficient algorithms for this model were proposed, including EM-based methods [20, 35, 41], projected quasi-Newton methods [1–3] and forward–backward approaches [7, 12, 43]. In particular, a nested EM-TV iterative scheme was proposed recently in [41] for reconstruction of PET data with a low signal-to-noise ratio, while an alternating extragradient method (AEM) was proposed in [7] for solving the primal–dual formulation of the Poisson ECT data model with the TV regularization.

In this paper, we study the numerical solution of the model that optimizes the sum of the negative log-likelihood of observed emission data conditional on radiotracer distribution (i.e. the Kullbach–Leibler divergence) and a TV regularization term. The ECT detector physics requires that the solution of this model be nonnegative. Difficulties with the numerical solution of such a model stem from the nonlinearity due to the use of the Kullbach–Leibler divergence and the TV regularization, from the nonnegativity constraint on the solution, and from the dense and large-sized system matrix necessary for realistic ECT models. Motivated by our previous work [27, 29, 30], we characterize the solutions of the model in terms of a fixed-point of the proximity operators with a precondition matrix. Numerical algorithms are then developed based on such a characterization. More precisely, by identifying the TV as a composition of the convex function, which defines the 'ℓ¹-norm' or the 'ℓ²-norm', and the first-order difference operator, we formulate a characterization of the exact positive solutions to the optimization problem in terms of a system of fixed-point equations via the proximity operators of the convex function that defines the 'ℓ¹-norm' or the 'ℓ²-norm' and the one that defines the first quadrant of an Euclidean space. The proximity operators of these simple functions have close forms, which provide great computational advantages. The nonlinearity is expressed in terms of a system of fixed-point equations, which naturally leads to an alternating projection algorithm. The dense system matrix of large size is treated by preconditioning. Appropriate choices of the preconditioning matrices lead to efficient computational algorithms for solving the model.

This paper is divided into seven sections. In section 2, we outline the maximum a posteriori ECT image reconstruction model in terms of a somewhat general constrained convex optimization problem. In section 3, characterizations of the solution of the optimization problem are presented in terms of a system of fixed-point equations via proximity operators, and the iterative algorithms for finding the solution are developed based on the characterizations. Section 4 is devoted to the convergence analysis of the algorithm. In section 5, we specialize the general algorithm to the TV-regularized ECT image reconstruction problem. Numerical experiments are presented in section 6 to test the approximation accuracy and computational efficiency of the proposed algorithm. The numerical results demonstrate that the alternating projection algorithm with the EM-preconditioner outperforms significantly the EM-TV in both the convergence speed and the image quality. We also observe that our algorithm performs favorably in comparison to the nested EM-TV in both the convergence speed and the image quality. We make concluding remarks in section 7.

2. Maximum a posteriori estimation for ECT reconstruction

In this section, we first present a mathematical model of a realistic ECT imaging system. We then find the unknown radioactive tracer distribution by maximizing the posterior probability distribution (the object function) using the observed emission data, knowing the probability density function of the unknown radioactive tracer distribution and the Bayes law. This approach is called the maximum a posteriori expectation-maximization (MAP-EM). Finally, the existence of a solution of the resulting variational problem is discussed.

We begin by introducing the notation to be used throughout this paper. Let $\mathbb {N}_k:=\lbrace 1,2,\ldots ,k\rbrace$ . We use the same notation '1' to represent both the scalar number 1 and the vector with all components equal to 1. They can be distinguished by the context of their use. For $x \in \mathbb {R}^k$ , the expression x ⩾ 0 means that all components of x are no less than 0, and in this case, we say that x is a nonnegative vector. We denote by $\mathbb {R}^k_{+}$ the set $\lbrace x: x \in \mathbb {R}^k \; \mbox{and} \; x \ge 0\rbrace$ . For any vectors x and y in $\mathbb {R}^k$ , we define

$\begin{eqnarray*} &&x\odot y :=(x_iy_i: i \in \mathbb {N}_k)\ \ \mbox{and}\ \ \frac{x}{y}:=\left(\frac{x_i}{y_i}: i \in \mathbb {N}_k\right), \end{eqnarray*}$

respectively, as the componentwise multiplication of x and y, and componentwise division of x by y. The logarithmic function at $x \in \mathbb {R}^k$ is defined as

$\begin{eqnarray*} &&\ln x : = (\ln x_i: i \in \mathbb {N}_k), \end{eqnarray*}$

while the expression 'x + γ', the sum of the vector x with a scalar $\gamma \in \mathbb {R}$ , is understood as the vector

$\begin{eqnarray*} &&x+\gamma :=(x_i+\gamma : i \in \mathbb {N}_k). \end{eqnarray*}$

We use 〈 ·, ·〉 and || · ||, respectively, for the inner product and the corresponding ℓ²-norm in an Euclidean space, while we use || · ||₁ and || · ||_∞, respectively, for the ℓ¹-norm and ℓ^∞-norm.

It is well accepted that gamma and x-ray photons, as well as positrons emitted in radioactive decay, follow a Poisson distribution. Assuming that the detection of photons by detector units are events independent of one another and can be described by a Bernoulli process, consequently, the projection images are a collection of Poisson random variables. The Poisson distribution approximates the photon-detection process only if one can neglect the detector dead-time, i.e. count losses in the real detector systems, and no corrections are applied to the raw data. In addition to photons emitted from the patient, the detector bins (units) detect events due to ubiquitous radioactive terrestrial background and cosmic rays. It is assumed that they also follow the Poisson distribution.

Let f represent the expected radiotracer distribution within a patient or a phantom. Let g denote a vector with the ith component being the number of single photons for SPECT or numbers of pairs of annihilation photons for PET originated from the radiotracer and recorded by the ith detector unit or detector unit pair during the SPECT or PET scan, respectively. The dimension of the vector g is the number of detector units or detector unit pairs in the SPECT or PET imaging system, respectively. We assume that γ is a vector of the same size as g, with its ith component being the mean number of 'background' counts recorded by the ith bin in SPECT or bin pair in PET. Under these assumptions, the observed emission data vector g related to the unknown radiotracer distribution f can be approximated by the following model [25, 42]:

$\begin{equation} g= \mathrm{Poisson} \left( Af +\gamma \right), \end{equation} \tag{ 1 }$

where Poisson(α) denotes a Poisson distributed random vector with mean α, and A is the ECT system matrix with its (ij)th element equal to the probability of detection of the photon emitted from voxel j of image f by the ith detector bin in SPECT or detector bin pair in PET.

The MAP probability EM estimate has been proven useful in ECT for estimating the unobservable radiotracer distribution f when prior knowledge on probability distribution function of f is available, especially when the observed emission data g are noisy or incomplete [9, 16–19, 21, 32]. Specifically, we assume that g in (1) is a given random vector in $\mathbb {R}^m$ and f is a random vector in $\mathbb {R}^d$ . The MAP estimate f_⋆ is obtained by maximizing the conditional a posteriori probability p(f|g), the probability that f occurs when g is observed. This probability may be computed using the Bayes law:

$\begin{eqnarray} &&p(f|g) \varpropto p(g|f)p(f), \end{eqnarray} \tag{ 2 }$

where α∝β means that the scalar α is proportional to the scalar β. By taking the logarithm of both sides of equation (2), the MAP estimate can then be calculated using the formula

$\begin{equation} f_\star =\arg \max \left\lbrace \ln p(g|f)+\ln p(f): f\in \mathbb {R}^{d}\right\rbrace . \end{equation} \tag{ 3 }$

In other words, the MAP estimate f_⋆ is obtained by maximizing expectation, i.e. by minimizing the sum of a negative log-likelihood of the observed emission data conditional on radiotracer distribution and a positive logarithm of the prior probability distribution. The first term can be considered as a fidelity term, a measure of the discrepancy between the estimated and the observed data. The second term is a regularization function, which penalizes solutions that have low probability. The Gibbs priors are commonly used in ECT reconstruction [15, 21, 26] in both convex [14, 17, 32, 34] and nonconvex [18, 22, 47] forms

$\begin{equation} p(f) \varpropto\exp (-\lambda U(f)) \end{equation} \tag{ 4 }$

with the Gibbs real-valued energy function U(f) defined on $\mathbb {R}^{d}$ and a positive regularization parameter λ called hyperparameter.

We formulate our MAP ECT reconstruction model from (3) and (4) by computing the likelihood objective p(g|f) using equation (1) and specifying the energy function U. According to equation (1), g follows the Poisson distribution having Af + γ as its mean. As a result, the probability density function p(g|f) of g conditioned on f can be computed by using the formula

$\begin{equation} p(g|f)=\prod _{i\in \mathbb {N}_{m}}\frac{\left((Af)_{i}+\gamma _{i}\right)^{g_{i}}\exp (-\left((Af)_{i}+\gamma _{i}\right))}{g_{i}!}. \end{equation} \tag{ 5 }$

We choose the energy function U in (4) as the TV semi-norm. Accordingly, following the notation used in [29], we have that

$\begin{equation} U(f):=(\varphi \circ B)(f), \end{equation} \tag{ 6 }$

where B is an n × d first-order difference matrix and φ is the ℓ¹-norm for the anisotropic TV or the ℓ²-norm for the isotropic TV on $\mathbb {R}^{n}$ . We shall provide more details of the TV later in section 5. Taking the logarithm of both sides of equation (5) and incorporating identities

$\begin{eqnarray*} &&\sum _{i\in \mathbb {N}_{m}} g_i \ln ((Af)_i+\gamma _i) =\langle \ln (Af+\gamma ),g \rangle \end{eqnarray*}$

and

$\begin{eqnarray*} &&\sum _{i\in \mathbb {N}_{m}} (Af)_i=\langle Af,1 \rangle \end{eqnarray*}$

in the resulting equation, we obtain that

$\begin{eqnarray*} &&\ln p(g|f) = \langle \ln (Af+\gamma ),g \rangle - \langle Af,1 \rangle +\mathrm{const}_1, \end{eqnarray*}$

where const₁ is a constant independent of f. Moreover, from (4) and (6), we have that

$\begin{eqnarray*} &&\ln p(f) = -\lambda (\varphi \circ B)(f) + \mathrm{const}_2, \end{eqnarray*}$

where const₂ is again a constant independent of f. Substituting the above two equations into (3) leads to the following variational problem:

$\begin{equation} \min \left\lbrace \langle Af,1 \rangle - \langle \ln (Af+\gamma ),g \rangle +\lambda (\varphi \circ B)(f): f \in \mathbb {R}^d_+ \right\rbrace . \end{equation} \tag{ 7 }$

Although the variational problem (7) is derived in the specific context of the ECT image reconstruction, in this paper, we shall consider solving the problem in the following somewhat more general setting. We assume that A is a matrix in $\mathbb {R}^{m\times d}$ , g is a given vector in $\mathbb {R}^m$ , γ is a positive vector in $\mathbb {R}^m$ , λ is a positive number, φ is a convex nonnegative function on $\mathbb {R}^n$ and B is an n × d matrix. The ECT image reconstruction model is a special case of model (7) when φ○B is chosen as the TV semi-norm. For this reason, we call the above variational problem the Poisson-TV model. In the remaining part of this section and sections 3 and 4, we consider model (7) in the general setting described above, while in sections 5 and 6 we specify it for the ECT image reconstruction model. In model (7), the requirement of $f \in \mathbb {R}^d_+$ is feasible due to the fact that f represents the mean radiotracer activity distribution in the object in ECT.

In the remaining part of this section, we establish the existence of the solutions to the variational problem (7). Recall that for a lower semicontinuous and convex function F defined over $\mathbb {R}^{d}$ , a sufficient condition for F to have a minimizer over a closed convex set C is that the intersection of C and a lower level set of F at some height $\xi \in \mathbb {R}$ defined by

$\begin{eqnarray*} &&\mathrm{lev}_{\le \xi }F := \left\lbrace f: f\in \mathbb {R}^{d}, F(f)\le \xi \right\rbrace \end{eqnarray*}$

is nonempty and bounded (see, e.g., [5]). We require some conditions on the system matrix A, motivated from the ECT imaging system. We denote by $\mathbb {A}$ the collection of m × d matrices, each of whose columns is a nonzero vector in $\mathbb {R}^m_+$ . Since photons emitted from each voxel of an object are detected by detector bins in the ECT imaging system, it is reasonable to assume that the general system matrix A in (1) is in $\mathbb {A}$ .

Proposition 2.1. If $A\in \mathbb {A}$ , $g\in \mathbb {R}^m_+$ , γ is a positive vector in $\mathbb {R}^m$ , λ is a positive number, φ is a convex nonnegative function on $\mathbb {R}^n$ and B is an n × d matrix, then the solution set of model (7) is nonempty.

Proof. Let $F: \mathbb {R}^d_+ \rightarrow \mathbb {R}$ be defined at any $f \in \mathbb {R}^d_+$ as

$\begin{eqnarray*} &&F(f):=\langle Af,1 \rangle - \langle \ln (Af+\gamma ),g \rangle +\lambda (\varphi \circ B)(f). \end{eqnarray*}$

Clearly, F is the objective function of the variational problem (7). It suffices to prove that F is convex and coercive on the unbounded convex domain $\mathbb {R}^d_+$ .

Note that φ is a convex function. The convexity of F on $\mathbb {R}^d_+$ is equivalent to the convexity of the function

$\begin{eqnarray*} &&F_1:=\langle A \cdot ,1 \rangle - \langle \ln (A \cdot +\gamma ),g \rangle \end{eqnarray*}$

on $\mathbb {R}^d_+$ . Since F₁ is twice continuously differentiable and the Hessian of F₁ at $f \in \mathbb {R}^d_+$ ,

$\begin{eqnarray*} &&\nabla ^2 F_1 (f) := A^{\top }\mathrm{diag}\left( \frac{g}{(Af+\gamma )^{2}} \right)A, \end{eqnarray*}$

is positive semi-definite due to $g \in \mathbb {R}^m_+$ , the convexity of F follows.

Since φ is bounded below by 0, for any $\xi \in \mathbb {R,}$ we have that lev_⩽ξF⊂lev_⩽ξF₁. Hence, by theorem 11.9 in [5], the existence of the minimizer of the objective function F over R^d₊ follows from the boundedness of the lower level set lev_⩽ξF₁ for some $\xi \in \mathbb {R}$ . Recalling the exercise 14 of section 1.2 in [8] and the conditions imposed on A, g and γ, we know that the function F₁ has compact lower level sets, which in turn completes the proof. □

3. Characterizations and algorithms

In this section, we first characterize solutions of the optimization problem (7) via the proximity operator. We then develop an alternating projection algorithm for solving the optimization problem based on the characterizations.

Let $\mathbb {H}$ denote an Euclidean space. For a proper convex function $\psi : \mathbb {H} \rightarrow \mathbb {R} \cup \lbrace +\infty \rbrace$ , having a nonempty domain (the set on which ψ is finite), the proximity operator of ψ, denoted by prox_ψ, is a mapping from $\mathbb {H}$ to itself, defined for a given vector $x \in \mathbb {H}$ by

$\begin{equation} \mathrm{prox}_{\psi }(x):=\arg \min \big\lbrace \psi (u)+\case{1}{2}\Vert u-x\Vert ^2: u\in \mathbb {H} \big\rbrace . \end{equation} \tag{ 8 }$

The subdifferential of a proper function ψ on $\mathbb {H}$ at a given vector $x \in \mathbb {H}$ is the set defined by

$\begin{eqnarray*} &&\partial \psi (x):=\lbrace y: y\in \mathbb {H} \quad \mbox{and} \quad \psi (z)\ge \psi (x)+\langle y,z-x\rangle \quad \mbox{for all} \quad z\in \mathbb {H}\rbrace . \end{eqnarray*}$

The subdifferential and the proximity operator of the function ψ are intimately related. Specifically, for x in the domain of ψ and $y \in \mathbb {H,}$ we have that

$\begin{equation} y\in \partial {\psi }(x) \quad \mbox{if and only if} \quad x=\mathrm{prox}_{\psi }(x+y). \end{equation} \tag{ 9 }$

For a discussion of this relation, see, e.g., [5, proposition 16.34] and [29, 37].

The indicator function of a closed convex set C in $\mathbb {H}$ is defined as

$\begin{eqnarray*} \iota _C (u) := \left\lbrace \begin{array}{@{}ll@{}}0, & \hbox{if $u \in C$,} \\ +\infty , & \hbox{otherwise.} \end{array} \right. \end{eqnarray*}$

It can be observed that the proximity operator for the indicator function of a closed convex subset C in $\mathbb {H}$ is the projection operator onto C. For notational simplicity, when the set C is $\mathbb {R}^d_{+}$ , we let

$\begin{eqnarray*} &&\Upsilon :=\iota _{\mathbb {R}^d_{+}}. \end{eqnarray*}$

The subdifferential of ϒ can be explicitly given. To this end, for a given vector $x=(x_i: i \in \mathbb {N}_d)$ in $\mathbb {R}^d_{+}$ , we define a projection matrix P associated with x by

$\begin{eqnarray*} &&P(x):=\mathrm{diag}(\delta (x_i): i \in \mathbb {N}_d), \end{eqnarray*}$

where δ(x_i) equals to 1 if x_i = 0, and 0 otherwise. Then, by the definition of subdifferential, for a vector x in $\mathbb {R}^d_{+}$ , we have that

$\begin{equation} \partial \Upsilon (x) = -P(x) \mathbb {R}^d_{+}. \end{equation} \tag{ 10 }$

Define

$\begin{eqnarray*} &&\mathbb {S}:=\lbrace S: \mbox{$S$ is a $d\times d$ diagonal matrix with positive diagonal entries}\rbrace . \end{eqnarray*}$

For any $S \in \mathbb {S}$ and $x \in \mathbb {R}^d_{+}$ , from (10) together with the identity $S \mathbb {R}^{d}_{+} =\mathbb {R}^{d}_{+}$ , we have that

$\begin{eqnarray*} &&S \partial \Upsilon (x) = -S P(x) \mathbb {R}^d_{+} = -P(x) S \mathbb {R}^d_{+} = -P(x) \mathbb {R}^d_{+}=\partial \Upsilon (x). \end{eqnarray*}$

Thus, we observe that

$\begin{equation} S \circ \partial \Upsilon = \partial \Upsilon . \end{equation} \tag{ 11 }$

For given positive numbers λ and μ, a vector g in $\mathbb {R}^m_+$ , an m × d matrix $A \in \mathbb {A}$ , an n × d matrix B, we define $H: \mathbb {R}^{d}_+\times \mathbb {R}^{n}\rightarrow \mathbb {R}$ , at $(f,b) \in \mathbb {R}^{d}_+\times \mathbb {R}^{n}$ , as

$\begin{equation} H(f,b):=\langle Af,1 \rangle - \langle \ln (Af+\gamma ),g \rangle +\lambda \mu \langle Bf,b \rangle . \end{equation} \tag{ 12 }$

We use ∇_fH and ∇_bH to denote the gradient of H with respect to its first and the second variables, respectively. More precisely, we have for any point $(f,b) \in \mathbb {R}^{d}_+\times \mathbb {R}^{n}$ that

$\begin{eqnarray} \nabla _{f}H(f,b)&=&A^\top \left(1-\frac{g}{Af+\gamma }\right)+\lambda \mu B^{\top }b, \end{eqnarray} \tag{ 13 }$

$\begin{eqnarray} \nabla _{b}H(f,b)&=&\lambda \mu Bf . \end{eqnarray} \tag{ 14 }$

Set

$\begin{equation} \tau :=\frac{\beta }{\lambda } \quad \mbox{and} \quad \sigma :=\frac{1}{\lambda \mu }. \end{equation} \tag{ 15 }$

With this preparation, we present below a characterization of the solutions of model (7) via coupled fixed-point equations.

Theorem 3.1. Let $A\in \mathbb {A}$ , $g\in \mathbb {R}^m_+$ , γ and λ be two positive numbers, φ be a proper convex nonnegative function defined on $\mathbb {R}^n$ , and B be an n × d matrix. If $f\in \mathbb {R}^{d}_+$ is a solution of the minimization problem (7), then for any β, μ > 0, and $S\in \mathbb {S}$ , there exists $b\in \mathbb {R}^{n}$ such that the pair $(f,b) \in \mathbb {R}^d_+ \times \mathbb {R}^{n}$ is a solution of the following coupled equations:

$\begin{equation} b = \left(\mathcal {I}-\mathrm{prox}_{{\mu }^{-1}\varphi }\right)\left(b+ \sigma \nabla _{b}H(f,b)\right), \end{equation} \tag{ 16 }$

$\begin{equation} f = \mathrm{prox}_{\Upsilon }\left(f- \tau S \nabla _{f}H(f,b)\right). \end{equation} \tag{ 17 }$

Conversely, if there exist β, μ > 0, $S \in \mathbb {S}$ , $b\in \mathbb {R}^{n}$ and $f\in \mathbb {R}^{d}_+$ such that the above equations hold, then f is a solution of the minimization problem (7).

Proof. Let f be a solution of the minimization problem (7). Applying Fermat's rule to (7), we obtain the relation

$\begin{equation} 0 \in A^{\top }\left( 1-\frac{g}{Af+\gamma } \right)+\lambda B^{\top }\partial \varphi (Bf)+ \partial \Upsilon (f). \end{equation} \tag{ 18 }$

Hence, for arbitrary positive numbers β and μ, there exist $u\in \frac{\beta }{\lambda }\partial \Upsilon (f)$ and $b\in \frac{1}{\mu }\partial \varphi (Bf)$ such that from (13) and (18), we have that

$\begin{equation} 0= \nabla _{f}H(f,b) +\frac{\lambda }{\beta }u. \end{equation} \tag{ 19 }$

By using relation (9), equation (16) is a direct consequence of $b\in \frac{1}{\mu }\partial \varphi (Bf)$ and (14). Multiplying the inclusion $u\in \frac{\beta }{\lambda }\partial \Upsilon (f)$ by the diagonal matrix S and using (11) yields the relation Su ∈ ∂ϒ(f). Applying (9) to the above inclusion, we have that

$\begin{equation} f = \mathrm{prox}_{\Upsilon }(f+Su). \end{equation} \tag{ 20 }$

Solving u from (19) and substituting it into (20) yields (17).

Conversely, if β and μ are given positive numbers and S is a matrix in $\mathbb {S,}$ such that (f, b) is a solution of (16)–(17), then all the arguments discussed above are reversible. This completes the proof. □

The characterization of the solutions to the minimization problem (7) in theorem 3.1 is essential for deriving other equivalent ones with the aim of developing efficient algorithms for finding the solutions of the variational problem. As an example, an alternative formulation for the minimization problem (7) based on theorem 3.1 is presented below.

Proposition 3.2. Let $A\in \mathbb {A}$ , $g\in \mathbb {R}^m_+$ , γ and λ be two positive numbers, φ be a proper convex nonnegative function defined on $\mathbb {R}^n$ , and B be an n × d matrix. If $f\in \mathbb {R}^{d}_+$ is a solution of the minimization problem (7), then for any β, μ > 0 and $S\in \mathbb {S}$ , there exists $b\in \mathbb {R}^{n}$ such that the pair $(f,b) \in \mathbb {R}^d_+ \times \mathbb {R}^{n}$ is a solution of the coupled equations

$\begin{equation} b = (\mathcal {I}-\mathrm{prox}_{{\mu }^{-1}\varphi })(b+\sigma \nabla _{b}H (\mathrm{prox}_{\Upsilon }( f-\tau S \nabla _{f}H(f,b) ),b ) ), \end{equation} \tag{ 21 }$

$\begin{equation} f = \mathrm{prox}_{\Upsilon }(f-\tau S \nabla _{f}H(f,b) ). \end{equation} \tag{ 22 }$

Conversely, if there exist β, μ > 0, $S \in \mathbb {S}$ , $b\in \mathbb {R}^{n}$ and $f\in \mathbb {R}^{d}_+$ such that the above equations hold, then f is a solution of the minimization problem (7).

From the given assumptions, the existence of a fixed point (f_⋆, b_⋆) of the coupled equations (21) and (22) is a direct consequence of proposition 2.1 and theorem 3.1 together with proposition 3.2.

Based on the above characterization, the following Picard iteration is adopted to find a solution of equations (21) and (22): given any initial $(f^{(0)},b^{(0)}) \in \mathbb {R}^{d}_+ \times \mathbb {R}^{n}$ , for any k = 0, 1, ..., we compute

$\begin{equation} \fl \left\lbrace \begin{array}{@{}l}b^{(k+1)} = (\mathcal {I}-\mathrm{prox}_{{\mu }^{-1}\varphi })(b^{(k)}+\sigma \nabla _{b}H (\mathrm{prox}_{\Upsilon }( f^{(k)}-\tau S \nabla _{f}H(f^{(k)},b^{(k)}) ),b^{(k)} ) ) \\\ms f^{(k+1)} = \mathrm{prox}_{\Upsilon }(f^{(k)}-\tau S \nabla _{f}H(f^{(k)},b^{(k+1)})). \end{array} \right. \end{equation} \tag{ 23 }$

We show next that both $\mathcal {I}-\mathrm{prox}_{{\mu }^{-1}\varphi }$ and prox_ϒ are projections. Since the iterative scheme (23) applies these two projections alternately, we shall call scheme (23) an alternating projection algorithm. It is of particularly interest that S in the characterization of proposition 3.2 can be viewed as a preconditioner in our developed iterative scheme (23); therefore, it makes the algorithm practically tractable. For this reason, we call this matrix the preconditioning matrix.

To better prepare the convergence analysis of this algorithm, which will be the focus of the next section, we present an equivalent form of (23). This new form is as follows:

$\begin{equation} \left\lbrace \begin{array}{@{}l@{}}h^{(k)}:=\mathrm{prox}_{\Upsilon }( f^{(k)}-\tau S \nabla _{f}H(f^{(k)},b^{(k)})), \\\ms b^{(k+1)} =(\mathcal {I}-\mathrm{prox}_{{\mu }^{-1}\varphi }) (b^{(k)}+\sigma \nabla _{b}H (h^{(k)},b^{(k)} ) ), \\\ms f^{(k+1)} = \mathrm{prox}_{\Upsilon }(f^{(k)}-\tau S \nabla _{f}H(f^{(k)},b^{(k+1)})). \end{array} \right. \end{equation} \tag{ 24 }$

We now derive a technical lemma to show that the operator $\mathcal {I}-\mathrm{prox}_{{\mu }^{-1}\varphi }$ in (24) is a projection operator on a closed convex set of $\mathbb {R}^n$ . To this end, we need to review the concept of the conjugate function and some related results. Let $\psi : \mathbb {H} \rightarrow \mathbb {R}\cup \lbrace +\infty \rbrace$ be a proper function. The function $\psi ^*: \mathbb {H} \rightarrow \mathbb {R} \cup \lbrace +\infty \rbrace$ defined, at $u \in \mathbb {H}$ , by

$\begin{eqnarray*} &&\psi ^* (u) : = \sup \lbrace \langle x, u \rangle - \psi (x): x \in \mathbb {H} \rbrace \end{eqnarray*}$

is called the conjugate of ψ at u. For a proper lower semicontinuous convex function ψ on $\mathbb {H}$ and any positive number α, the proximity operator of ψ and that of its conjugate ψ* satisfy the following relation [5, theorem 14.3(ii)] and [31, proposition 4.a]:

$\begin{equation} \mathcal {I}=\mathrm{prox}_{\alpha \psi }+\alpha \,\mathrm{prox}_{\alpha ^{-1}\psi ^*}\circ \alpha ^{-1}\mathcal {I}. \end{equation} \tag{ 25 }$

Next, we recall the notion of the positive homogeneous function. A function $\psi : \mathbb {H} \rightarrow \lbrace -\infty \rbrace \cup \mathbb {R}\cup \lbrace +\infty \rbrace$ is positive homogeneous if for any $x\in \mathbb {H}$ and any positive number α:

$\begin{equation} \psi (\alpha x)=\alpha \psi (x). \end{equation} \tag{ 26 }$

Clearly, ψ(0) = 0. Furthermore, if ψ is a positive homogeneous and proper lower semicontinuous convex function on $\mathbb {H,}$ then the conjugate function ψ* is the indicator function on the set ∂ψ(0) (see, e.g., [5, 39]):

$\begin{equation} \psi ^* = \iota _{\partial \psi (0)}. \end{equation} \tag{ 27 }$

Lemma 3.3. If ψ is a positive homogeneous and proper lower semicontinuous convex function on $\mathbb {H}$ and α is a positive number, then $\mathcal {I}-\mathrm{prox}_{\alpha \psi }$ is the projection operator on the set α∂ψ(0):

$\begin{equation} \mathcal {I}-\mathrm{prox}_{\alpha \psi } = \mathrm{prox}_{\iota _{\alpha \partial \psi (0)}}. \end{equation} \tag{ 28 }$

Proof. From (27), we know that α⁻¹ψ* = α⁻¹ι_∂ψ(0) = ι_∂ψ(0). Combining this relation with (25), we obtain that

$\begin{eqnarray*} &&\mathcal {I}-\mathrm{prox}_{\alpha \psi } = \alpha \,\mathrm{prox}_{\iota _{\partial \psi (0)}}\circ \alpha ^{-1}\mathcal {I} = \mathrm{prox}_{\iota _{\alpha \partial \psi (0)}}, \end{eqnarray*}$

proving the desired formula. □

By lemma 3.3, we conclude that $\mathcal {I}-\mathrm{prox}_{{\mu }^{-1}\varphi }$ in (24) is the same as $\mathrm{prox}_{\iota _{\mu ^{-1}\partial \varphi (0)}}$ . Hence, each equation in the iterative algorithm (24) involves an operator of the form $\mathrm{prox}_{\iota _C}(\cdot - T(\nabla G)(\cdot ))$ for a convex set C in $\mathbb {H}$ , a function G defined on $\mathbb {H}$ and a symmetric positive-definite matrix T mapping $\mathbb {H}$ to itself.

We shall end this section by providing a more general characterization of the solutions of the minimization problem (7). To this end, for a proper convex nonnegative function φ on $\mathbb {R}^n$ , an n × d matrix B, a given $h \in \mathbb {R}^d$ , a positive number μ and a positive integer r, we define $Q_h^r: \mathbb {R}^{n} \rightarrow \mathbb {R}^{n}$ , with

$\begin{eqnarray*} &&Q_h^0:= \mathcal {I}\ \ \mbox{and}\ \ Q_h:=(\mathcal {I}-\mathrm{prox}_{{\mu }^{-1}\varphi })(\cdot +B h), \end{eqnarray*}$

recursively, by

$\begin{equation} Q_h^r: = Q_h \circ Q_h^{r-1}. \end{equation} \tag{ 29 }$

By using equation (14) and the proof of proposition 3.2, we may establish the following result.

Proposition 3.4. Let $A\in \mathbb {A}$ , $g\in \mathbb {R}^m_+$ , γ and λ be two positive numbers, φ be a proper convex nonnegative function defined on $\mathbb {R}^n$ and B be an n × d matrix. If $f\in \mathbb {R}^{d}_+$ is a solution of the minimization problem (7), then for any β, μ > 0 and $S\in \mathbb {S}$ , there exists $b\in \mathbb {R}^{n}$ such that the pair $(f,b) \in \mathbb {R}^d_+ \times \mathbb {R}^{n}$ is a solution of the coupled equations

$\begin{eqnarray} &&b = Q_f^r(b) \end{eqnarray} \tag{ 30 }$

$\begin{eqnarray} &&f = \mathrm{prox}_{\Upsilon }(f-\tau S \nabla _{f}H(f,b) ). \end{eqnarray} \tag{ 31 }$

Conversely, if there exist β, μ > 0, $S \in \mathbb {S}$ , $b\in \mathbb {R}^{n}$ and $f\in \mathbb {R}^{d}_+$ such that the above equations hold, then f is a solution of the minimization problem (7).

Essentially, both theorem 3.1 and proposition 3.2 can be viewed as special cases of proposition 3.4 with r = 1. The purpose of introducing the operator Q^r_f with r greater than 1 is mainly from consideration of developing efficient algorithms in sections 5 and 6.

4. Convergence analysis

We analyze in this section convergence of the preconditioned alternating projection algorithm (PAPA) described in the last section.

The convergence consideration of the algorithm requires introducing the notion of the weighted norm. For a symmetric and positive-definite matrix $T: \mathbb {H}\rightarrow \mathbb {H}$ , we define the weighted inner product by

$\begin{eqnarray*} &&\langle x,y \rangle _{T} := \langle T^{-1}x,y \rangle , \quad x,y\in \mathbb {H}. \end{eqnarray*}$

The induced weighted norm is accordingly defined by

$\begin{eqnarray*} &&\Vert x \Vert _{T} :=\sqrt{\langle x,x \rangle _{T}}. \end{eqnarray*}$

Note that a symmetric positive-definite matrix has precisely one symmetric positive-definite square root. Hence, we can rewrite $\langle x,x \rangle _{T}=\langle T^{-\frac{1}{2}}x,T^{-\frac{1}{2}}x \rangle$ , which in turn implies that the above weighted norm is a norm on $\mathbb {H}$ .

Next, we present a lemma that provides a tool for the proof of the convergence of the sequence generated by the iterative scheme (23). For any initial pair $(f^{(0)},b^{(0)}) \in \mathbb {R}^{d}_+ \times \mathbb {R}^n$ , we let

$\begin{eqnarray*} &&\mathcal {U}:=\lbrace (f^{(k)},b^{(k)}): k \in \mathbb {N}\rbrace ,\ \ \mbox{where}\ \ \mathbb {N}:=\lbrace 1,2,\ldots \rbrace \end{eqnarray*}$

be the sequence generated by the iterative scheme (23).

Lemma 4.1. Let σ and τ be the two positive numbers defined by (15), H be a function on $\mathbb {R}^d \times \mathbb {R}^n$ defined by (12), $S\in \mathbb {S}$ , and φ be a positive homogeneous and convex function on $\mathbb {R}^n$ . If the following conditions hold:

the sequence $\mathcal {U}$ is bounded,
lim_{k → ∞}||f^{(k + 1)} − f^(k)||_S = lim_{k → ∞}||b^{(k + 1)} − b^(k)|| = 0,

then there exists a subsequence of $\mathcal {U}$ that converges to a solution of the coupled equations (21) and (22).

Proof. Since by hypothesis $\mathcal {U}$ is a bounded sequence in $\mathbb {R}^{d}_+\times \mathbb {R}^{n}$ , there exists a subsequence $\lbrace (f^{(k_i)},b^{(k_i)}): i \in \mathbb {N}\rbrace$ that converges to a point $(\widehat{f},\widehat{b})$ in $\mathbb {R}^{d}_+\times \mathbb {R}^{n}$ . This together with condition (ii) ensures that the subsequence $\lbrace (f^{(k_i+1)},b^{(k_i+1)}): i \in \mathbb {N}\rbrace$ converges to the same point. Therefore, in (23), by choosing k := k_i and letting i → ∞, we conclude that the point $(\widehat{f},\widehat{b})$ satisfies the coupled equations (21)–(22). □

The use of lemma 4.1 in the proof of the convergence result requires that we verify the hypotheses of the lemma. This is fulfilled by establishing the following estimate on the quantities defined by

$\begin{eqnarray*} &&{\rm e}^{(k)}:=\frac{1}{2\tau }{\Vert f^{(k)}-f_\star \Vert _{S}^2}+\frac{1}{2\sigma }{\Vert b^{(k)}-b_\star \Vert ^2} \end{eqnarray*}$

for a solution (f_⋆, b_⋆) of equations (21)–(22) that

$\begin{eqnarray} &&\fl {\rm e}^{(k)}+\frac{1}{2\tau }\sum _{j=0}^{k-1}{\Vert f^{(j+1)}-h^{(j)}\Vert _{S}^2}+\frac{1}{2\tau }\sum _{j=0}^{k-1}C_j{\Vert h^{(j)}-f^{(j)}\Vert _{S}^2}\nonumber\\ &&+\frac{1}{2\sigma }\sum _{j=0}^{k-1}D_j{\Vert b^{(j+1)}-b^{(j)}\Vert ^2} \le {\rm e}^{(0)}, \end{eqnarray} \tag{ 32 }$

where h^(j) is defined as in (24), and C_j and D_j are some positive constants.

We now establish several results needed for proving estimate (32). For a proper convex function $\psi : \mathbb {H} \rightarrow \mathbb {R} \cup \lbrace +\infty \rbrace$ and a symmetric positive-definite matrix T, the proximity operator of ψ with respect to T, denoted by prox^T_ψ, is defined for a given vector $x \in \mathbb {H}$ by

$\begin{eqnarray*} &&\mathrm{prox}_{\psi }^{T}(x):=\arg \min \big\lbrace \psi (u)+\case{1}{2}\Vert u-x\Vert _{T}^2: u\in \mathbb {H} \big\rbrace . \end{eqnarray*}$

Clearly, we have that prox_ψ = prox^I_ψ. Moreover, for x in the domain of ψ and $y \in \mathbb {H}$ , we have the following generalization of (9):

$\begin{equation} y\in T\partial {\psi }(x) \quad \mbox{if and only if} \quad x=\mathrm{prox}_{\psi }^{T}(x+y). \end{equation} \tag{ 33 }$

We next present a reformulation of the proximity operator prox_ϒ.

Proposition 4.2. If $S \in \mathbb {S}$ , then

$\begin{equation} \mathrm{prox}_{\Upsilon }=\mathrm{prox}_{\Upsilon }^{S}. \end{equation} \tag{ 34 }$

Proof. Let $x \in \mathbb {R}^d$ and p = prox_ϒ(x). By relation (9), we have the inclusion x − p ∈ ∂ϒ(p). Since $p\in \mathbb {R}^{d}_{+}$ , using equation (11), we obtain that x − p ∈ S∂ϒ(p). By relation (33), it yields that p = prox^S_ϒ(x), which completes the proof. □

With proposition 4.2 in mind, we next discuss several properties of the operator $\mathrm{prox}_{\iota _C}^{T} (\cdot - T(\nabla G)(\cdot ))$ , which are crucial in our convergence analysis.

Lemma 4.3. Let C be a nonempty closed convex subset of $\mathbb {H}$ , T be a symmetric positive-definite matrix mapping $\mathbb {H}$ to itself and G be a proper convex differentiable function on $\mathbb {H}$ . Set

$\begin{equation} p:=\mathrm{prox}_{\iota _C}^{T}(x-T\nabla G(x)). \end{equation} \tag{ 35 }$

Then, the following statements hold.

For any y ∈ C,
$\begin{equation} \Vert p-y \Vert _{T}^2\le \Vert x-y \Vert _{T}^2-\Vert p-x \Vert _{T}^2-2\langle \nabla G(x),p-y \rangle \end{equation} \tag{ 36 }$
and
$\begin{equation} \fl \Vert p-y \Vert _{T}^2\le \Vert x-y \Vert _{T}^2-\Vert p-x \Vert _{T}^2-2\langle \nabla G(x), p-x \rangle +2\left[ G(y)-G(x) \right]. \end{equation} \tag{ 37 }$
Furthermore, if G is also differentiable at p, then for any y ∈ C,
$\begin{eqnarray} &&\fl \Vert p-y \Vert _{T}^2\le \Vert x-y \Vert _{T}^2-\Vert p-x \Vert _{T}^2-2\langle \nabla G(x)-\nabla G(p), p-x \rangle +2\left[ G(y)-G(p) \right]. \nonumber\\ \end{eqnarray} \tag{ 38 }$

Proof. We first prove the inequalities in item (i). Note that for any $y \in \mathbb {H}$ , we have that

$\begin{equation} \Vert p-y \Vert _{T}^2 =\Vert x-y \Vert _{T}^2-\Vert p-x \Vert _{T}^2+2\langle T^{-1}(p-x),p-y \rangle . \end{equation} \tag{ 39 }$

By equations (35) and (9), we have the inclusion relation

$\begin{equation*} x-T\nabla G(x)-p\in T\partial \iota _C(p). \end{equation*}$

Multiplying the above inclusion by the symmetric positive-definite matrix T⁻¹ and recalling the definition of the subdifferential, we have for all y ∈ C that

$\begin{eqnarray*} &&\langle T^{-1}(x-T\nabla G(x)-p),y-p \rangle \le \iota _C(y)-\iota _C(p)= 0. \end{eqnarray*}$

By splitting the term 〈T⁻¹(p − x), p − y〉 as the sum of 〈T⁻¹(x − T∇G(x) − p), y − p〉 and 〈 − ∇G(x), p − y〉, and using the above inequality, from (39) we conclude that inequality (36) holds. Inequality (37) follows from (36) together with

$\begin{eqnarray*} &&\langle \nabla G(x),p-y \rangle = \langle \nabla G(x),p-x \rangle + \langle \nabla G(x),x-y \rangle \end{eqnarray*}$

and the inequality

$\begin{eqnarray*} &&\langle \nabla G(x),x-y \rangle \ge G(x)-G(y) \end{eqnarray*}$

ensured by the convexity of G.

Finally, since G is differentiable at p, again by the convexity of G, we have that

$\begin{eqnarray*} &&\langle \nabla G(p),p-x \rangle \ge G(p)-G(x). \end{eqnarray*}$

This together with (37) leads to (38). □

As a consequence of lemma 4.3 with G being identical to zero, using (36) twice, we derive for all $x,y \in \mathbb {H}$ that

$\begin{equation} \Vert \mathrm{prox}_{\iota _C}^{T}(x)-\mathrm{prox}_{\iota _C}^{T}(y)\Vert _{T} \le \Vert x-y\Vert _{T}. \end{equation} \tag{ 40 }$

We next show that a solution of the coupled equations (16) and (17) is a saddle point of the function H. This result is necessary for establishing inequality (32).

Lemma 4.4. Let H be a function on $\mathbb {R}^d \times \mathbb {R}^n$ defined by (12). If $(f_\star , b_\star ) \in \mathbb {R}^{d}_+ \times \mathbb {R}^{n}$ is a solution of the coupled equations (16) and (17), then inequality

$\begin{equation} H(f_\star , b) \le H(f_\star , b_\star ) \le H(f, b_\star ) \end{equation} \tag{ 41 }$

holds for any point $(f, b) \in \mathbb {R}^{d}_+ \times \left({\mu }^{-1} \partial \varphi (0) \right)$ .

Proof. We prove this lemma by showing the following two inequalities:

$\begin{equation} H(f_\star , b) \le H(f_\star ,b_\star ), \ \ \mbox{for all}\ \ b \in \left({\mu }^{-1} \partial \varphi (0) \right) \end{equation} \tag{ 42 }$

and

$\begin{equation} H(f_\star , b_\star ) \le H(f, b_\star ), \quad \mbox{for all}\ \ f\in \mathbb {R}^d_+. \end{equation} \tag{ 43 }$

We first prove inequality (42). By the definition of b_⋆ and (28), we have that

$\begin{eqnarray*} &&b_\star =\mathrm{prox}_{\iota _{{\mu }^{-1} \partial \varphi (0)}}(b_\star + Bf_\star ). \end{eqnarray*}$

Employing the characterization of the projection, we observe that

$\begin{eqnarray*} &&\langle Bf_\star , b-b_\star \rangle =\langle (b_\star + Bf_\star )-b_\star , b-b_\star \rangle \le 0,\ \ \mbox{for all} \ \ b \in \left({\mu }^{-1} \partial \varphi (0) \right), \end{eqnarray*}$

which is equivalent to inequality (42).

It remains to show inequality (43). Using relations (9) and (34), from (17) we obtain for any $S\in \mathbb {S}$ that

$\begin{eqnarray*} &&0 \in S\nabla _f H(f_\star ,b_\star )+ S\partial \iota _{\mathbb {R}^{d}_+}(f_\star ). \end{eqnarray*}$

Multiplying the above inclusion by the matrix S⁻¹ guarantees that H( ·, b_⋆) achieves its minimum value at the point f_⋆. Thus, inequality (43) is valid. □

We next prove estimate (32) by employing lemmas 4.3 and 4.4. To this end, for an $S\in \mathbb {S}$ , we define the quantities

$\begin{eqnarray*} A_{j}:=\left\lbrace \begin{array}{@{}ll@{}}\dsty \frac{\Vert \nabla _{f}H(h^{(j)},b^{(j+1)})-\nabla _{f}H(f^{(j)},b^{(j+1)})\Vert _{S^{-1}}}{\Vert h^{(j)}-f^{(j)} \Vert _{S}}, & \hbox{if $\Vert h^{(j)}-f^{(j)} \Vert _{S} \ne 0$,} \\ \dsty 0, & \hbox{otherwise,} \end{array} \right. \end{eqnarray*}$

and

$\begin{eqnarray*} B_{j}:=\left\lbrace \begin{array}{@{}ll@{}}\dsty\frac{\Vert \nabla _{f}H(f^{(j)},b^{(j+1)})-\nabla _{f}H(f^{(j)},b^{(j)})\Vert }{\Vert b^{(j+1)}-b^{(j)}\Vert }, & \hbox{if $\Vert b^{(j+1)}-b^{(j)}\Vert \ne 0$,} \\ \dsty 0, & \hbox{otherwise.} \end{array} \right. \end{eqnarray*}$

Lemma 4.5. If (f_⋆, b_⋆) is a solution of the coupled equations (21)–(22), then the estimate (32) holds with C_j := 1 − 2τA_j and D_j := 1 − 2τ||B||₂||S||₂B_j.

Proof. Let j be a positive integer. Identifying x, y, p, T and G in (38), respectively, with b^(j), b_⋆, b^{(j + 1)}, I and −σH(h^(j), ·), and recalling b^{(j + 1)} being defined by the projection (24), we have that

$\begin{eqnarray} &&\fl \frac{1}{2\sigma }\Vert b^{(j+1)}-b_\star \Vert ^2 \le \frac{1}{2\sigma }\Vert b^{(j)}-b_\star \Vert ^2-\frac{1}{2\sigma }\Vert b^{(j+1)}-b^{(j)}\Vert ^2+H(h^{(j)},b^{(j+1)})-H(h^{(j)},b_\star ), \nonumber\\ \end{eqnarray} \tag{ 44 }$

where we have used the fact that ∇_bH(f, ·) is a constant. Likewise, identifying x, y, p, T and G in (37), respectively, with f^(j), f_⋆, f^{(j + 1)}, S and τH( ·, b^{(j + 1)}), recalling f^{(j + 1)} being defined by the projection (23) and applying proposition 4.2, we observe that

$\begin{eqnarray} &&\fl \frac{1}{2\tau }\Vert f^{(j+1)}-f_\star \Vert _S^2 \le \frac{1}{2\tau }\Vert f^{(j)}-f_\star \Vert _S^2-\frac{1}{2\tau }\Vert f^{(j+1)}-f^{(j)}\Vert _S^2 \nonumber \\ &&+\,\langle \nabla _{f}H(f^{(j)},b^{(j+1)}), f^{(j)}-f^{(j+1)} \rangle+H(f_\star ,b^{(j+1)})-H(f^{(j)},b^{(j+1)}). \end{eqnarray} \tag{ 45 }$

Using the inequality

$\begin{eqnarray*} &&H(f_\star ,b^{(j+1)})\le H(h^{(j)},b_\star ), \end{eqnarray*}$

which is ensured by lemma 4.4, in the sum of the last two terms of (44) and (45) yields

$\begin{eqnarray} &&\fl H(h^{(j)},b^{(j+1)})-H(f^{(j)},b^{(j+1)}) + H(f_\star ,b^{(j+1)})-H(h^{(j)},b_\star ) \nonumber\\ &&\le H(h^{(j)}, b^{(j+1)})-\,H(f^{(j)}, b^{(j+1)}). \end{eqnarray} \tag{ 46 }$

The convexity of H( ·, b^{(j + 1)}) ensures that

$\begin{eqnarray*} &&\langle \nabla _{f}H(h^{(j)},b^{(j+1)}), f^{(j)}-h^{(j)} \rangle \le H(f^{(j)},b^{(j+1)})-H(h^{(j)},b^{(j+1)}), \end{eqnarray*}$

which combined with (46) gives

$\begin{eqnarray} &&\fl H(h^{(j)},b^{(j+1)})-H(f^{(j)},b^{(j+1)}) + H(f_\star ,b^{(j+1)})-H(h^{(j)},b_\star ) \nonumber\\ &&\le \langle \nabla _{f}H(h^{(j)},b^{(j+1)}), h^{(j)}-f^{(j)} \rangle . \end{eqnarray} \tag{ 47 }$

Moreover, identifying x, y, p, T and G in (36), respectively, with f^(j), f^{(j + 1)}, h^(j), S and τH( ·, b^(j)), and recalling definition (24) of h^(j) in terms of the projection, we obtain that

$\begin{eqnarray} &&\fl \frac{1}{2\tau }\Vert f^{(j+1)}-f^{(j)}\Vert _S^2\ge \frac{1}{2\tau }\Vert f^{(j+1)}-h^{(j)}\Vert _S^2+\frac{1}{2\tau }\Vert h^{(j)} \nonumber\\ &&-\,f^{(j)}\Vert _S^2 + \langle \nabla _{f}H(f^{(j)},b^{(j)}),h^{(j)}-f^{(j+1)} \rangle . \end{eqnarray} \tag{ 48 }$

Recalling the definition of e^(j), summing (44) and (45), and using (48) and (47) in the resulting sum yield

$\begin{eqnarray} &&\fl {\rm e}^{(j+1)}-{\rm e}^{(j)}\le -\frac{1}{2\tau }{\Vert f^{(j+1)}-h^{(j)}\Vert _S^2} -\frac{1}{2\tau }{\Vert h^{(j)}-f^{(j)}\Vert _S^2}-\frac{1}{2\sigma }{\Vert b^{(j+1)}-b^{(j)}\Vert ^2}+I_{1}+I_{2}, \nonumber\\ \end{eqnarray} \tag{ 49 }$

where

$\begin{eqnarray*} &&I_1 := \langle \nabla _{f}H(f^{(j)},b^{(j+1)})-\nabla _{f}H(h^{(j)},b^{(j+1)}),f^{(j)}-h^{(j)} \rangle , \nonumber \\&&\ms I_2 := \langle \nabla _{f}H(f^{(j)},b^{(j)})-\nabla _{f}H(f^{(j)},b^{(j+1)}),f^{(j+1)}-h^{(j)} \rangle . \nonumber \end{eqnarray*}$

Next, we further estimate the last two inner products in (49). Note that S and S⁻¹ are diagonal and positive-definite matrices; and hence, they have unique diagonal positive-definite square roots $S^{\frac{1}{2}}$ and $S^{-\frac{1}{2}}$ , respectively. We then have that

$\begin{eqnarray} I_1 & = & \langle S^{\frac{1}{2}} (\nabla _{f}H(f^{(j)},b^{(j+1)})-\nabla _{f}H(h^{(j)}, b^{(j+1)})),S^{-\frac{1}{2}}(f^{(j)}-h^{(j)}) \rangle \nonumber \\\ms & \le & \Vert \nabla _{f}H(f^{(j)},b^{(j+1)})-\nabla _{f}H(h^{(j)},b^{(j+1)})\Vert _{S^{-1}}\cdot \Vert f^{(j)}-h^{(j)}\Vert _{S}. \end{eqnarray} \tag{ 50 }$

The last inequality follows from using the Cauchy–Schwartz inequality together with the definition of the weighted norm. Likewise, we can also obtain that

$\begin{eqnarray} I_2 & \le & \Vert \nabla _{f}H(f^{(j)},b^{(j)})-\nabla _{f}H(f^{(j)},b^{(j+1)})\Vert _{S^{-1}}\cdot \Vert f^{(j+1)}-h^{(j)}\Vert _{S}. \end{eqnarray} \tag{ 51 }$

From the definitions of f^{(j + 1)} in (23) and h^(j) in (24) and inequality (40), we obtain

$\begin{eqnarray} \Vert f^{(j+1)}-h^{(j)}\Vert _S & \le & \tau \Vert S ( \nabla _f H(f^{(j)},b^{(j+1)})-\nabla _f H(f^{(j)},b^{(j)}))\Vert _S \nonumber \\ & = & \tau \Vert S^{\frac{1}{2}}( \nabla _f H(f^{(j)},b^{(j+1)})-\nabla _f H(f^{(j)},b^{(j)}))\Vert . \end{eqnarray} \tag{ 52 }$

On the other hand, since

$\begin{eqnarray*} &&\nabla _{f}H(f^{(j)},b^{(j)})-\nabla _{f}H(f^{(j)},b^{(j+1)})=\lambda \mu B^\top (b^{(j)}-b^{(j+1)}), \end{eqnarray*}$

this equation with (52) yields that

$\begin{eqnarray*} &&\Vert f^{(j+1)}-h^{(j)}\Vert _{S} \le \tau \lambda \mu \Vert B\Vert _2 \Vert S^{\frac{1}{2}} \Vert _2 \Vert b^{(j)}-b^{(j+1)}\Vert , \end{eqnarray*}$

which together with the estimate (51) implies that

$\begin{equation} I_2 \le \lambda \mu \tau B_j \Vert B\Vert _2 \Vert S \Vert _2 \Vert b^{(j+1)}-b^{(j)}\Vert ^2. \end{equation} \tag{ 53 }$

Combining inequalities (49), (50) and (53) together with using the definition of A_j gives the estimate

$\begin{eqnarray} &&\fl {\rm e}^{(j+1)}- {\rm e}^{(j)} \le -\frac{1}{2\tau }{\Vert f^{(j+1)}-h^{(j)}\Vert _S^2} -\frac{C_j}{2\tau }{\Vert h^{(j)}-f^{(j)}\Vert _S^2} -\frac{D_j}{2\sigma }{\Vert b^{(j+1)}-b^{(j)}\Vert ^2}. \end{eqnarray} \tag{ 54 }$

Summing the above inequality (54) for j running from 0 to k yields estimate (32). □

In order to use the last lemma to show the validity of the hypotheses of lemma 4.1, we bound the constants A_j and B_j that appear in the last lemma. To this end, we need the following lemma that pertains to the Lipschitz continuity of the gradient of H with respect to each of its variables.

Lemma 4.6. If H is a function on $\mathbb {R}^d \times \mathbb {R}^n$ defined by (12) and $S\in \mathbb {S}$ , then for a fixed vector b in $\mathbb {R}^{n}$ , ∇_fH( ·, b) is Lipschitz continuous with the constant $\frac{\Vert g\Vert _{\infty }\Vert A\Vert _{2}^{2}\Vert S \Vert _2}{\gamma ^2}$ , while for a fixed vector f in $\mathbb {R}^d$ , ∇_fH(f, ·) is Lipschitz continuous with the constant λμ||B||₂. That is, for any $f_1, f_2 \in \mathbb {R}^d_+$ and $b \in \mathbb {R}^{n}$

$\begin{equation} \Vert \nabla _{f}H(f_{1},b)-\nabla _{f}H(f_{2},b)\Vert _{S^{-1}} \le \frac{\Vert g\Vert _{\infty }\Vert A \Vert _{2}^{2}\Vert S \Vert _2}{\gamma ^2}\Vert f_1-f_2 \Vert _{S} \end{equation} \tag{ 55 }$

and for any $b_1, b_2\in \mathbb {R}^{n}$ and $f \in \mathbb {R}^{d}_+$

$\begin{equation} \Vert \nabla _{f}H(f,b_1)-\nabla _{f}H(f,b_2)\Vert \le \lambda \mu \Vert B\Vert _2 \Vert b_1-b_2 \Vert . \end{equation} \tag{ 56 }$

Proof. Using (13) and the definition of the weighted norm, for any $f_1, f_2 \in \mathbb {R}^d_+$ and $b \in \mathbb {R}^{n}$ , we have that

$\begin{eqnarray*} &&\Vert \nabla _{f}H(f_{1},b)-\nabla _{f}H(f_{2},b)\Vert _{S^{-1}} =\left\Vert S^{\frac{1}{2}} A^\top \left(\frac{(AS^{\frac{1}{2}}S^{-\frac{1}{2}}(f_2-f_1)) \odot g}{(Af_1+\gamma ) \odot (Af_2+\gamma )}\right)\right\Vert . \end{eqnarray*}$

It can be verified that

$\begin{eqnarray*} &&\fl \Vert (Af_1+\gamma ) \odot (Af_2+\gamma )\Vert _\infty \ge \gamma ^2 \\ &&\mbox{and} \ \ \Vert (AS^{\frac{1}{2}}S^{-\frac{1}{2}}(f_2-f_1)) \odot g\Vert \le \Vert A\Vert _2 \Vert S^{\frac{1}{2}} \Vert _2 \Vert g\Vert _\infty \Vert f_1-f_2\Vert _{S}, \end{eqnarray*}$

from which inequality (55) follows.

Inequality (56) follows directly from the expression of ∇_fH given in (13). □

Finally, we show the convergence of the sequence $\mathcal {U}:=\lbrace (f^{(k)},b^{(k)}): k \in \mathbb {N}\rbrace$ generated by the PAPA (23).

Theorem 4.7. Let λ, μ, γ be positive numbers, σ and τ be the numbers defined by (15), $S\in \mathbb {S}$ , φ be a positive homogeneous convex function on $\mathbb {R}^n$ , and H be the function on $\mathbb {R}^d \times \mathbb {R}^n$ defined by (12). If β and μ are chosen to satisfy the conditions

$\begin{eqnarray*} &&0<\beta \le \frac{(1-\varepsilon )\lambda \gamma ^2}{2\Vert g \Vert _{\infty }\Vert A\Vert _{2}^{2}\Vert S \Vert _2},\quad 0<\beta \mu \le \frac{1-\varepsilon }{2\Vert B\Vert _{2}^{2}\Vert S \Vert _2} \end{eqnarray*}$

for some ε ∈ (0, 1), then for any initial pair $(f^{(0)},b^{(0)}) \in \mathbb {R}^{d}_+ \times \mathbb {R}^{n}$ , the sequence $\mathcal {U}$ converges to a solution of the coupled equations (21)–(22).

Proof. We prove this theorem by employing lemma 4.1. The conditions imposed on β and μ together with lemma 4.6 lead to the fact that both 1 − 2τA_j and 1 − 2τ||B||₂||S||₂B_j are bounded below by epsilon for all j. Hence, lemma 4.5 implies that the sequence $\lbrace (f^{(j)},b^{(j)}): j \in \mathbb {N}\rbrace$ is bounded and the sequence $\lbrace \Vert b^{(j+1)}-b^{(j)}\Vert : j \in \mathbb {N}\rbrace$ converges to 0, and moreover, both the sequences $\lbrace \Vert h^{(j)}-f^{(j)}\Vert _{S}: j \in \mathbb {N}\rbrace$ and $\lbrace \Vert f^{(j+1)}-h^{(j)}\Vert _{S}: j \in \mathbb {N}\rbrace$ converge to 0, which implies the convergence of the sequence $\lbrace \Vert f^{(k+1)}-f^{(k)}\Vert _{S}: k \in \mathbb {N}\rbrace$ to 0. Therefore, by lemma 4.1, there exists a subsequence $\lbrace (f^{(k_i)},b^{(k_i)}): i \in \mathbb {N}\rbrace$ which converges to $(\widehat{f},\widehat{b}) \in \mathbb {R}^d_+ \times (\mu ^{-1} \partial \varphi (0))$ , a solution of equations (21)–(22). It remains to show that the sequence $\mathcal {U}$ also converges to the same point $(\widehat{f},\widehat{b})$ . Indeed, inequality (54) ensures that

$\begin{equation} \fl \frac{1}{2\tau }{\Vert f^{(j+1)}-f_\star \Vert _{S}^2}+\frac{1}{2\sigma }{\Vert b^{(j+1)}-b_\star \Vert ^2}\le \frac{1}{2\tau }{\Vert f^{(j)}-f_\star \Vert _{S}^2}+\frac{1}{2\sigma }{\Vert b^{(j)}-b_\star \Vert ^2}. \end{equation} \tag{ 57 }$

Replacing the point (f_⋆, b_⋆) in inequality (57) by the point $(\widehat{f},\widehat{b})$ and summing the resulting inequality for j from k_i to k − 1 with k > k_i leads to the inequality

$\begin{eqnarray*} &&\frac{1}{2\tau }{\Vert f^{(k)}-\widehat{f} \Vert _{S}^2}+\frac{1}{2\sigma }{\Vert b^{(k)}-\widehat{b}\Vert ^2} \le \frac{1}{2\tau }{\Vert f^{(k_i)}-\widehat{f} \Vert _{S}^2}+\frac{1}{2\sigma }{\Vert b^{(k_i)}-\widehat{b}\Vert ^2}. \nonumber \end{eqnarray*}$

This clearly guarantees that the sequence $\mathcal {U}$ converges to the point $(\widehat{f},\widehat{b})$ . □

We remark that the conditions stated in the last theorem imposed on the parameter β and μ are rather restricted. We shall demonstrate in section 6 by numerical examples that choices of significantly larger β and μ than those allowed by the theorem lead to convergence of the iteration.

5. TV-regularized MAP ECT reconstruction

In this section, we specialize the general PAPA developed in section 3 to the MAP ECT reconstruction by specifying the function φ, the matrices B and the preconditioning matrix S in (23). In particular, we present explicit formulas of the proximity operators for the two special convex functions involved in the algorithm. We also discuss the importance of the inner iteration for efficient computation in the context of MAP ECT reconstruction. Finally, we compare differences and advantages of our proposed algorithm with those of several existing algorithms for the Poisson-TV model (7).

We first present explicit expressions of φ and B according to the definition of the TV [40]. The concrete expressions of φ and B depend on how 3D images are vectorized. A 3D image is assembled by a stack of 2D images. For convenience of exposition, we assume that an image considered in this paper has a size of p × p × q. The image is treated as a vector in $\mathbb {R}^{p^2q}$ in such a way that the ijkth voxel of the image, where $i,j \in \mathbb {N}_p$ and $k \in \mathbb {N}_q$ , corresponds to the (i + (j − 1)p + (k − 1)p²)th element of the vector in $\mathbb {R}^{p^2q}$ . In the current section, we set d := p²q. To define the matrix B, we define an α × α difference matrix D_α by

$\begin{eqnarray*} D_\alpha :={\left[\begin{array}{@{}llll@{}}0 & & &\\ -1 &1 & &\\ &\ddots &\ddots &\\ & &{-1} &{1} \\ \end{array}\right]}. \end{eqnarray*}$

In terms of the notion of the matrix Kronecker product ⊗, we define the 3d × d matrix B by

$\begin{eqnarray*} &&B:=\left[\begin{array}{@{}l@{}} I_q \otimes I_{p} \otimes D_p\\ I_q \otimes D_p \otimes I_{p}\\ D_q \otimes I_p \otimes I_p \end{array}\right]. \end{eqnarray*}$

The convex function $\varphi : {\mathbb {R}}^{3d}\rightarrow \mathbb {R}$ is defined at $z \in {\mathbb {R}}^{3d}$ as

$\begin{equation} \varphi (z):=\sum _{i=1}^{d} \left\Vert [z_i, z_{d+i}, z_{2d+i}]^\top \right\Vert . \end{equation} \tag{ 58 }$

The isotropic TV of a vector f is then expressed as φ(Bf).

Execution of the iterative scheme (23) requires the availability of explicit formulas for the proximity operators of functions φ defined in (58) and the indicator function $\Upsilon :=\iota _{\mathbb {R}^d_+}$ . For a positive number μ and a vector $z \in \mathbb {R}^{3d}$ , the components of the vector

$\begin{eqnarray*} &&y:=\mathrm{prox}_{{\mu }^{-1}\varphi }(z) \end{eqnarray*}$

can be computed by using the formula

$\begin{eqnarray*} &&\fl [y_i, y_{d+i}, y_{2d+i}]^\top = \max \left\lbrace \Vert [z_i, z_{d+i}, z_{2d+i}]^\top \Vert -\frac{1}{\mu }, 0\right\rbrace \frac{[z_i, z_{d+i}, z_{2d+i}]^\top }{\Vert [z_i, z_{d+i}, z_{2d+i}]^\top \Vert }, \quad i \in \mathbb {N}_{d}. \end{eqnarray*}$

With this formula, for any positive integer r and a vector $h \in \mathbb {R}^d$ , the operator Q^r_h defined by (29) can be explicitly computed. The proximity operator of the indicator function ϒ (the projection operator onto the first octant $\mathbb {R}^d_+$ ) also has an explicit expression. That is, for $x \in \mathbb {R}^d_+$ ,

$\begin{eqnarray*} &&(\mathrm{prox}_{\Upsilon }(x))_i = \max \lbrace x_i,0\rbrace , \quad i \in \mathbb {N}_d. \end{eqnarray*}$

Thus, both $\mathrm{prox}_{{\mu }^{-1}\varphi }$ and prox_ϒ have close forms in the current context. These close forms are convenient for numerical evaluation of the proximity operators of the two specific functions in the sense that no further optimization problems are required to solve. These specific examples can be evaluated numerically within the machine precision, while, in general, computing the proximity operator of a convex function requires solving an optimization problem by its definition. Even in the situation when closed forms of the proximity operators are available, some round-off errors may be introduced during computation. In such a case, one needs to consider the stability issue of computing the proximity operator.

Based on proposition 3.4, we propose the following algorithm (algorithm 1) for the MAP ECT reconstruction.

.

Algorithm 1 (Alternating projection algorithm for MAP ECT reconstruction with a fixed matrix S)
1: Preparation: ∇_fH, ∇_bH, τ are defined in (13), (14) and (15), respectively. The parameter r is a
positive integer.
2: Initialization: f⁽⁰⁾ = 1, b⁽⁰⁾ = 0.
3: repeat
4: Step 1: h^(k) ← prox_ϒ(f^(k) − τS∇_fH(f^(k), b^(k)))
5: Step 2: $b^{(k+1)} \leftarrow Q_{h^{(k)}}^r(b^{(k)})$
6: Step 3: f^{(k + 1)} ← prox_ϒ(f^(k) − τS∇_fH(f^(k), b^{(k + 1)}))
7: until 'convergence'

The parameter r in algorithm 1 is the iteration number for the inner iteration. When r is chosen to be 1, the three steps in algorithm 1 correspond to the three equations in (24), respectively. In algorithm 1, both steps 1 and 3 involve the matrices A and A^⊤, while step 2 involves matrix B. In the context of the MAP ECT reconstruction, matrix A is a dense matrix of large size and matrix B is a sparse matrix of relatively small size. As a result, computation with the matrix A or A^⊤ is more costly than computation with the matrix B. To reduce the overall computational cost, we suggest that the inner iteration be carried out with an appropriate choice of iteration number r. By paying less computational effort with an appropriate r, we could obtain a more accurate estimate f^{(k + 1)} at the kth outer iterate. We shall study in the next section by numerical examples the choices of the iteration number r.

The preconditioning matrix S is not specified in algorithm 1. The choice of the preconditioning matrix is crucial in designing computationally efficient algorithms. One may choose the preconditioning matrix S as the identity matrix, which corresponds to the trivial case without preconditioning. More interesting cases are the nontrivial choices. The choice of the preconditioning matrix S may be motivated in different ways with the same purpose of speeding up the convergence of the algorithm. A possible nontrivial choice is motivated by the idea of the projected Newton method. For more details on the projected Newton method, see [3, 6]. In passing, we point out that preconditioning techniques were used in the context of ECT for other algorithms (see, e.g., [13, 23, 24, 32]).

We propose a choice of the preconditioner S based on the classical EM. Recall that EM is an iterative scheme for computing the ML estimate. According to [25], when the EM algorithm is applied to (1), we have for any $f^{(0)} \in \mathbb {R}^d_+$ that

$\begin{equation} f^{(k+1)} = E^{(k)} A^\top \left(\frac{g}{Af^{(k)}+\gamma }\right), \end{equation} \tag{ 59 }$

where E^(k) is a diagonal matrix defined by

$\begin{equation} E^{(k)}:=\mathrm{diag}\left( \frac{f^{(k)}}{A^{\top } 1} \right). \end{equation} \tag{ 60 }$

In the EM algorithm, the matrix E^(k) determines the direction for the next step of a search for the minimizer, for the purpose of finding the ML estimate. By comparing (59) with the form of ∇_fH given in (13), motivated by the matrix E^(k) having the form (60), we suggest that we choose the matrix S in algorithm 1 as the diagonal matrix E^(k) at the kth iteration. This choice of the preconditioning matrix allows the search for the minimizer to follow the direction of the search in the classical EM algorithm for finding the ML estimate while preserving the advantage of the alternating projection nature in the proposed algorithm. In this way, the preconditioning matrix is updated at every iterate step when a new value f^(k) is available. This leads to the following algorithm (algorithm 2) for the MAP ECT reconstruction.

.

Algorithm 2 (Preconditioned alternating projection algorithm for MAP ECT reconstruction)
Preparation: ∇_fH, ∇_bH and τ are defined in (13), (14) and (15), respectively. The parameter r is a
positive integer.
Initialization: f⁽⁰⁾ = 1, b⁽⁰⁾ = 0.
repeat
Step 1: $S^{(k)} \leftarrow \mathrm{diag}\left( \frac{f^{(k)}}{A^{\top }1}\right)$
Step 2: $h^{(k)} \leftarrow P_{\mathbb {R}^{d}_{+}}(f^{(k)}-\tau S^{(k)} \nabla _{f}H(f^{(k)},b^{(k)}))$
Step 3: $b^{(k+1)} \leftarrow Q_{h^{(k)}}^r(b^{(k)})$
Step 4: $f^{(k+1)} \leftarrow P_{\mathbb {R}^{d}_{+}}\left(f^{(k)}-\tau S^{(k)} \nabla _{f}H(f^{(k)},b^{(k+1)}) \right)$
until 'convergence'

Since the choice of S is motivated from the EM algorithm (59), we shall call algorithm 2 the EM PAPA for MAP ECT reconstruction and call the matrix E the EM-preconditioner. A numerical comparison of the proposed algorithms 1 and 2 will be presented in the next section. The numerical study shows that the EM-preconditioner speeds up significantly the convergence of the alternating projection algorithm.

We further comment on the dynamics of the EM-preconditioner. The PAPA, which we described in section 3 and for which we proved convergence in section 4, has a fixed preconditioning matrix S. The preconditioner in algorithm 2 changes dynamically from step to step. In the next section, we shall study the dynamics of the EM-preconditioner numerically and shall see that after some iteration steps, the change in the EM-preconditioner is so small that it can be neglected. In other words, the EM-preconditioner tends to a fixed preconditioning matrix as the iteration number increases. Therefore, in practice, we may fix the preconditioning matrix after some iteration steps and propose the following semi-dynamic PAPA (algorithm 3). In this way, the convergence theorem established in section 4 is still applicable.

.

Algorithm 3 (Semi-dynamic PAPA for MAP ECT reconstruction)
Preparation: ∇_fH, ∇_bH, τ are defined in (13), (14) and (15), respectively. The parameters r and l
are positive integers.
Initialization: f⁽⁰⁾ = 1, b⁽⁰⁾ = 0.
Run algorithm 2 until k > l.
Set $S^{(l)} = \mathrm{diag}\left( \frac{f^{(l)}}{A^{\top }1}\right)$ .
repeat
Step 1: $h^{(k)} \leftarrow P_{\mathbb {R}^{d}_{+}}(f^{(k)}-\tau S^{(l)} \nabla _{f}H(f^{(k)},b^{(k)}))$
Step 2: $b^{(k+1)} \leftarrow Q_{h^{(k)}}^r(b^{(k)})$
Step 3: $f^{(k+1)} \leftarrow P_{\mathbb {R}^{d}_{+}}\left(f^{(k)}-\tau S^{(l)} \nabla _{f}H(f^{(k)},b^{(k+1)}) \right)$
until 'convergence'

To close this section, we compare the PAPA with four existing algorithms, namely, the AEM [7], the nested EM-TV algorithm [41], the nested iterative algorithm for convex constrained problem [12] and the preconditioned primal–dual algorithm (P-PD) [37, 43].

The AEM algorithm, which is a variant of the extragradient method, was developed to solve the saddle-point formulation of the Poisson-TV model (7). We point out that the AEM appears to be a special case of the proposed algorithm 1 with the trivial preconditioner S = I and inner iteration number r = 1, but it was developed based on a different principle. The introduction of inner iteration number r and a nontrivial preconditioner S in the PAPA allows us to develop more efficient reconstruction algorithms.

We now compare the PAPA with the nested EM-TV algorithm. Actually, the nested EM-TV requires solving an optimization problem of the form

$\begin{eqnarray*} &&\mathrm{argmin}\left\lbrace \frac{1}{2}\left\langle f-f^{(k+\frac{1}{2})}, \frac{A^\top 1}{f^{(k)}} \odot (f-f^{(k+\frac{1}{2})}) \right\rangle +\lambda \varphi (B f): f \in \mathbb {R}^d_+\right\rbrace \end{eqnarray*}$

exactly in each outer iteration. However, the PAPA does not need to do this and it leads to a more efficient algorithm. This will be demonstrated by numerical examples in the next section.

Next, we compare the PAPA with the nested iterative algorithm proposed in [12] for solving convex constrained problems. The paper combined the forward–backward and the Douglas–Rachford iterations together to minimize the sum of two functions over a convex set. The validity of the resulting algorithm requires at least one of the two functions differentiable. However, the PAPA is developed based on a fixed-point characterization (in terms of the proximity operator) of the solutions of the Poisson-TV model (7), which does not necessarily require any term of the objective function to be differentiable.

Finally, we compare the PAPA with the P-PD algorithm developed in [43], which prototyped several convex optimization problems for CT image reconstruction with the primal–dual (PD) algorithm, proposed earlier in [11]. Within each complete iterate step, the PD algorithm introduces an extrapolation step based on the current and previous iterates, which can be seen as an approximate extragradient step. In contrast, the PAPA computes the primal leading point h^(k) by taking an extragradient step based on the current iterate only. Besides, the PAPA made good use of the sparsity of the difference matrix B by carrying out the inner iteration to reduce the overall computational cost, while the PD algorithm did not. Indeed, matrix A is much denser than matrix B, and hence, computation with A or A^⊤ is more costly than that with B. As we have pointed out earlier, by carrying out the inner iteration with an appropriate iteration number r, we could greatly accelerate the whole iterative scheme. Moreover, following the idea in [37], in [43] a P-PD algorithm was developed for model (7) using fixed diagonal preconditioners. However, in the PAPA, we proposed a dynamic preconditioner motivated from the EM algorithm, which proves to be more efficient and converges faster in the numerical experiments presented in the next section.

6. Numerical experiments

In this section, we report numerical results obtained from computational experiments for the proposed algorithms. We compare our algorithms with the conventional EM-TV algorithm [35] and the nested EM-TV algorithm [41] in terms of the reconstruction quality and computational performance. We also compare the convergence speed of our proposed algorithms with that of the P-PD algorithm developed in [43].

6.1. Simulated SPECT projection data

We created a digital cylindrical emission phantom with uniform mean background activity distribution (i.e. with uniform mean number of nuclear disintegrations per time unit and per unit volume) and sets of seven hot spheres and seven cold spheres embedded in the cylinder. The hot and cold spheres simulate hyperperfused and hypoperfused defects, respectively. Such defects are of interest in nuclear medicine and one of the main tasks of ECT is detection of such defects. Mean activities across all spheres are uniform. The mean activity ratio of hot:background:cold areas is 40:10:1, respectively. The pixel size used is 0.172 cm. The phantom dimensions are: base radius 84 pixels and length 128 pixels. The spheres radii are 3, 4, 5, 6, 7, 9 and 14 pixels. Their centers are in slices 33 and 97. The locations of spheres in transaxial and sagittal planes are shown in figure 1. The spheres are separated by a uniform region located between slices 47 and 82. The mean activity distribution in the phantom represents the mean radiotracer distribution, i.e. the image f in (1). The parallel-collimator SPECT projection data for our experiments consist of 120 views in 256 × 128 matrix with pixel size 1.78 mm and were generated using analytical pixel-wise discretized projector A in (1) with 20 rays per detector bin [45]. The generated data follow Poisson probability distribution created by a random number generator and the total number of detector counts in 120 views equal to 1.79 × 10⁶ and 1.947 × 10⁷ corresponding to approximately 10: 1 total activities ratio. Neither attenuation nor scatter was modeled and an ideal detector was assumed. Each image in these projection sets was then downsampled to a 128 × 64 matrix with the pixel size 3.56 mm. The coefficient of variation (CV) defined as the ratio of the standard deviation to the mean in the projection image of a sufficiently large uniform region away from edges is used as a measure of Poisson noise in the noisy projection data. Using such a definition, our projection sets are characterized at 56% and 15% noise levels, respectively (figure 2). For the sake of convenience, the phantom with higher (less noisy) and lower total activity (more noisy) is called 'low-noise phantom' and 'high-noise phantom', respectively. The low-noise phantom corresponds to clinically realistic SPECT data, while the high-noise phantom SPECT data would result in clinically unacceptable high noise in the reconstructed images if treated with the conventional reconstruction techniques.

**Figure 1.** Morphology of numerical phantom used: (a) transaxial cross-section (slice 17) through the centers of all hot spheres; (b) transaxial cross-section (slice 49) through the centers of all cold spheres; (c) sagittal cross-section through the centers of the two largest spheres. The mean activity ratio of hot:background:cold areas is 40:10:1, respectively. The sphere radii are counterclockwise: 3, 6, 4, 5, 7, 9 and 14 (center) pixels, respectively. The pixel size used is 0.172 cm.
Download figure:
Standard image

Examples of one noise realization for one projection view are shown in figure 2.

6.2. Numerical studies of the proposed algorithms

In this subsection, we assess the numerical performance of the proposed algorithms: algorithms 1–3 in image reconstruction for the SPECT projection data generated from the low-noise and high-noise phantoms. Specifically, we consider three issues related to the proposed algorithms: the necessity of the preconditioning, the convergence of the proposed preconditioner (60) and the role of the multiple iteration steps in the inner iteration.

In our numerical experiments, the regularization parameter λ in the Poisson-TV model (7) is chosen by adopting the bias–noise curve method. Specifically, we calculated bias and the CV in a reconstructed background region of interest for candidate regularization parameters (ranging from 1 × 10⁻⁴ to 1), and obtained the bias–CV curve. When choosing the optimal parameters, we consider the best trade-off between bias and CV. Furthermore, in order to improve the statistical accuracy, we generated three SPECT projection data sets with different noise realizations from the low-noise and the high-noise phantom, respectively, and evaluated the mean values of bias and CV with respect to the three different noise realizations. In particular, in our numerical experiments, the regularization parameter λ in the Poisson-TV model (7) is chosen to be 0.1 and 0.2 for the SPECT projection data sets generated from low-noise and high-noise phantom, respectively. The constant γ in the model is set to be 0.01 for both phantoms. Let $\lbrace f^{(k)}: k\in \mathbb {N}\rbrace$ be a sequence generated by either algorithm 1, 2 or 3. For a pre-given tolerance tol, the iterative process of an algorithm is terminated if the following requirement is satisfied:

$\begin{equation} {\Vert f^{(k)}-f^{(k+1)} \Vert }/{\Vert f^{(k+1)} \Vert }\le \mathrm{tol}. \end{equation} \tag{ 61 }$

The parameters in algorithms 1–3 are specified as follows. For algorithm 1, without preconditioning, (S = I) we choose β to be 10⁷ times of the upper bound suggested by theorem 4.7 and μ := 1/(2β||B||²₂). For algorithm 1 with the fixed diagonal preconditioner $\mathrm{diag}\left( \frac{1}{A^{\top } 1} \right)$ , we set β = λ and $\mu =1/(2\lambda \Vert B \Vert _{2}^2 \Vert \frac{1}{A^{\top }1}\Vert _{\infty })$ . For algorithms 2 and 3, we set β = λ and $\mu =1/(2\lambda \Vert B \Vert _{2}^2 \Vert \frac{f^{(k)}}{A^{\top }1}\Vert _{\infty })$ in their kth iterations.

In the first experiment, we compare the performance of algorithm 1 without preconditioning (S = I) with that of algorithm 1 with the fixed diagonal preconditioner $\mathrm{diag}\left( \frac{1}{A^{\top } 1} \right)$ and that of algorithms 2 and 3. The parameter r for all the algorithms is fixed at 10 and the other parameters for each algorithm are chosen as discussed above. When implementing algorithm 3, in the first 100 iterations, we use the dynamic preconditioner and after 100 iterations, we fix the preconditioner. Table 1 gives a summary of the CPU times (process times) and numbers of the complete iterations for reconstructing images from the noisy SPECT projection data set generated from low-noise phantom. It clearly shows that algorithms 2 and 3, which have dynamic and semi-dynamic EM-preconditioners, respectively, converge significantly faster than algorithm 1 with either preconditioner. We remark that algorithm 1 with either preconditioner cannot meet the stopping criteria for the last tolerance level. This phenomenon is marked by (–, –) in the table. Therefore, in the remaining part of this section, only algorithm 3 will be used to compare with other existing algorithms for the Poisson-TV model (7). For comparison, we include in table 1 the numerical results for the P-PD algorithm [43], which also uses a fixed preconditioner. Its performance is even worse than that of algorithm 1 with the fixed diagonal preconditioner $\mathrm{diag}\left( \frac{1}{A^{\top } 1} \right)$ .

Table 1. The pair ( ·, ·) represents the CPU time and the number of complete iterations used in algorithms 1–3 and P-PD algorithm with different choices of tol for the noisy SPECT projection data set generated from low-noise phantom. T represents the diagonal matrix $\mathrm{diag}\left( \frac{1}{A^{\top } 1} \right)$ .

tol Algorithm	10⁻¹	10⁻²	10⁻³	10⁻⁴	10⁻⁵	10⁻⁶	10⁻⁷
Algorithm 1 (S = I)	(5.4, 2)	(70.2, 27)	(250.9, 99)	(1359.6, 525)	(4300.9, 1653)	(10 241.8, 3940)	(–, –)
Algorithm 1 (S = T)	(5.4, 2)	(57.5, 23)	(210.5, 85)	(1088.5, 445)	(3204.9, 1301)	(7383.4, 3000)	(–, –)
Algorithm 2	(12.6, 5)	(35.0, 14)	(109.6, 44)	(291.1, 117)	(768.6, 309)	(1915.6, 770)	(5435.0, 2109)
Algorithm 3	(12.3, 5)	(34.8, 14)	(109.6, 44)	(305.7, 123)	(759.5, 307)	(1464.3, 592)	(3954.8, 1599)
P-PD Algorithm	(6.6, 2)	(75.9, 23)	(292.7, 89)	(1569.1, 479)	(4357.6, 1331)	(9885.1, 3024)	(–, –)

The proposed preconditioner (60) depends on the current iterate and is updated at each iterate step. The second experiment explores numerically how the preconditioner (60) changes in iterations in terms of its ℓ²-norm and Frobenius norm. Figure 3 presents the curves of both norms versus iteration numbers for the low-noise and high-noise phantoms. From the figure, we conclude that after a few iterations, the change of the preconditioner (60) is negligible. This suggests that except for the first few iterations, the preconditioner may be chosen to be the same. In this regard, the convergence result (theorem 4.7) that we establish for the fixed preconditioner is applicable to the practical situation.

**Figure 3.** Curves of (a) ℓ²-norm and (b) Frobenius norm of the preconditioner (60) versus iteration numbers for low-noise and high-noise phantoms.
Download figure:
Standard image

The third experiment tests numerically the choice of the parameter r in the inner iteration for algorithm 3 (PAPA) in terms of the CPU time that the algorithm requires to reach the stopping criterion (61) for different tolerances. We perform the PAPA with r from 6 to 15 for the noisy SPECT projection data sets generated from low-noise and high-noise phantoms with the tolerance tol = 10⁻⁶. The numerical results for the noisy SPECT projection data set generated from the low-noise phantom show that the algorithms with the parameter r ranging from 7 to 11 perform comparably and are better than those with other ranges of r. The numerical results for the noisy SPECT projection data set generated from the high-noise phantom indicate that the algorithms with the parameter r ranging from 10 to 12 perform comparably and are better than those with other ranges of r. According to this numerical observation, in the remaining part of this section, we fix the parameter r for the PAPA at 10.

6.3. Comparison of the PAPA with the EM-TV algorithm

We now compare the performance of the proposed PAPA with that of the conventional EM-TV [35], in terms of the CPU time, local image quality metrics: the CV measured within the uniform region of interest, contrast-to-noise ratio (CNR) and a global image quality metric, i.e. the normalized mean-squared error (NMSE).

We first recall the conventional EM-TV algorithm. For a given positive number δ, the smoothed version of the TV is a function $R:\mathbb {R}^{d}\rightarrow \mathbb {R}$ defined at $u\in \mathbb {R}^{d}$ as

$\begin{equation} R(u):=\sum _{i=1}^{d} \left\Vert [(Bu)_i, (Bu)_{d+i}, (Bu)_{2d+i}, \delta ]^\top \right\Vert . \end{equation} \tag{ 62 }$

With this function R, the conventional EM-TV algorithm [35] is described in algorithm 4.

.

Algorithm 4 (Conventional EM-TV algorithm for MAP ECT reconstruction)
Initialization: f⁽⁰⁾ = 1.
repeat
$f^{(k+1)} \leftarrow \mathrm{diag}\left(\frac{f^{(k)}}{A^{\top }1+\lambda \nabla R(f^{(k)})}\right) A^{\top }\left(\frac{g}{Af^{(k)}+\gamma }\right)$ .
until 'convergence'

For reconstructions with the conventional EM-TV algorithm, following [35], we choose δ = 0.001 which is less than 1% of the maximum value of the phantom. Moreover, we note that when λ vanishes the conventional EM-TV algorithm reduces to the EM algorithm (59).

We now evaluate the quality of the reconstructed images. First, we compare noise in the reconstructed images. We use the CV within a uniform region of interest as a surrogate of noise measure in the reconstructed images obtained using algorithm 2 and the conventional EM-TV. The CV of an image f^∞ reconstructed by an algorithm is defined by

$\begin{equation*} \mathrm{CV}:=\frac{\mathrm{SD}_{\Omega }(f^{\infty })}{\mathbb {E}_{\Omega }(f^{\infty })}, \end{equation*}$

where SD_Ω( · ) and $\mathbb {E}_{\Omega }(\cdot )$ denote the standard deviation and the mean of the reconstructed activities over a region Ω. In our case, the region Ω is a cylinder with the radius of base and the height equal to 25 and 8 pixels, respectively. This cylindrical region lies between the hot spheres and the cold spheres (refer to figure 1(c)) and does not intersect with any hot or cold spheres. Under these circumstances, the means of CVs (with respect to five different noise realizations) for the PAPA and the conventional EM-TV are 0.12% and 3.81%, respectively, for the low-noise phantom, and are 4.09% and 13.15%, respectively, for the high-noise phantom. From these numerical results, we find that the PAPA performs considerably (factor of 31 for low-noise phantom data and factor of 3 for high-noise phantom data) better than the conventional EM-TV in terms of the noise.

Next, a local quality metric, CNR, of images will be used to measure the quality of the reconstructed hot or cold spheres. Too low CNR might result in inability to detect the lesion by an observer. The CNR for a reconstructed image f^∞ is defined as a ratio of a lesion contrast to the background noise [4, 28, 38]

$\begin{eqnarray*} &&\mathrm{CNR}:=\frac{|\mathbb {E}_{\Omega _T}(f^{\infty })-\mathbb {E}_{\Omega _B}(f^{\infty })|}{\mathrm{SD}_{\Omega _B}(f^{\infty })}. \end{eqnarray*}$

Here, ${\mathbb {E}_{\Omega _T}(f^{\infty })}$ is the mean reconstructed activity in the region Ω_T where a specific hot (resp. cold) sphere is located, while Ω_B is a spherical region with the same diameter as the hot (resp. cold) sphere but located within the uniform region of the cylindrical phantom that does not intersect with any hot or cold sphere. The CNR metric is important in radiology because it allows one to assess detectability of a lesion, which is one of the main tasks of ECT. It is well established by human observer studies that the lesion contrast and the background noise influence lesion detectability [28]. We note that even when the signal-to-noise ratio is high, the presence of a significant bias in the reconstructed image might result in the lesion contrast being too low for a lesion to be detected. Very high noise or correlation of the noise will contribute to low detectability of the lesion even if the lesion contrast is high [33]. The curves of the means of CNRs (with respect to five different noise realizations again) versus diameters of spheres are plotted in figure 4 . Again, we observe that the PAPA very significantly (factor of 7–14 for high-noise phantom) outperforms the conventional EM-TV in terms of the CNR values.

**Figure 4.** Reconstructed image quality assessment for hot and cold spheres: plots of mean (with respect to five different noise realizations) CNR versus diameter of sphere. The vertical error bars denote the calculated standard deviation of the mean. For the EM-TV, the error bars are smaller than the symbols used: (a) hot spheres in low-noise phantom, (b) cold spheres in low-noise phantom, (c) hot spheres in high-noise phantom and (d) cold spheres in high-noise phantom. Open squares denote the conventional EM-TV. Open circles denote the PAPA.
Download figure:
Standard image

We use the NMSE to assess accuracy of reconstructions. The NMSE is a global image quality metric. It quantifies the difference between the activity reconstruction f^∞ and the true mean activity f in the whole object. It is defined by

$\begin{equation*} \mathrm{NMSE}:=\frac{\Vert f-f^\infty \Vert ^2}{\Vert f\Vert ^2}. \end{equation*}$

The mean values of NMSE (with respect to five different noise realizations) for the images reconstructed by the PAPA and the conventional EM-TV are 0.485 and 0.787, respectively, for the low-noise phantom, and 2.55 and 3.93, respectively, for the high-noise phantom. Once again, the PAPA significantly outperforms the conventional EM-TV in terms of the NMSE values.

6.4. Comparison of the PAPA with the nested EM-TV algorithm

In this subsection, we compare the performance of the PAPA with that of the nested EM-TV algorithm, the algorithm recently described in [41].

In algorithm 5, we describe the nested EM-TV algorithm for ECT reconstruction.

.

Algorithm 5 (Nested EM-TV algorithm for MAP ECT reconstruction)
Initialization: f⁽⁰⁾ = 1.
repeat
`EM Step`: $f^{(k+\frac{1}{2})} \leftarrow \mathrm{diag}\left(\frac{f^{(k)}}{A^{\top }1}\right) A^{\top }\left(\frac{g}{Af^{(k)}+\gamma }\right)$ .
`TV Step`: $f^{(k+1)} \leftarrow \mathrm{argmin}\left\lbrace \frac{1}{2}\left\langle f-f^{(k+\frac{1}{2})}, \frac{A^\top 1}{f^{(k)}} \odot (f-f^{(k+\frac{1}{2})}) \right\rangle +\lambda \varphi (B f): f \in \mathbb {R}^d_+\right\rbrace .$
until 'convergence'

This algorithm has two major steps, i.e. the EM step and the TV correction step, and they are performed alternately. The EM step is identical to (59). The TV correction step is a modified version of the Rudin–Osher–Fatemi (ROF) model and was implemented by exploiting an existing scheme for the ROF model. In fact, it was stated in [41] that the TV correction step in algorithm 5 was carried out by adopting Chambolle's method originally reported in [10]. We shall follow the suggestion made in [41] to apply Chambolle's method with ten iterations for each TV correction step to achieve an approximate solution.

In table 2, we list the CPU time expended and the number of complete iterations used by the PAPA, the nested EM-TV and the conventional EM-TV, for the two SPECT projection data sets. After examining the table, we conclude that under the same stopping criteria the PAPA is better than the nested EM-TV and significantly better than the conventional EM-TV in terms of the convergence speed. We remark that the conventional EM-TV algorithm cannot even meet the stopping criteria for most of the given tolerance levels.

Table 2. Comparison of the performance of PAPA, the conventional EM-TV and the nested EM-TV for the test data. The pair ( ·, ·) represents the CPU time and the number of complete iterations used.

tol Algorithm	10⁻¹	10⁻²	10⁻³	10⁻⁴	10⁻⁵	10⁻⁶	10⁻⁷
Low-noise phantom
PAPA	(12.3, 5)	(34.8, 14)	(109.6, 44)	(305.7, 123)	(759.5, 307)	(1464.3, 592)	(3954.8, 1599)
Nested EM-TV	(14.1, 5)	(40.0, 14)	(122.8, 43)	(333.1, 117)	(872.4, 307)	(2207.3, 777)	(6184.5, 2179)
EM-TV	(7.15, 4)	(–, –)	(–, –)	(–, –)	(–, –)	(–, –)	(–, –)
High-noise phantom
PAPA	(12.4, 5)	(35.3, 14)	(106.2, 42)	(295.3, 117)	(724.7, 289)	(1451.6, 580)	(4349.4, 1741)
Nested EM-TV	(13.2, 5)	(38.1, 14)	(115.2, 42)	(304.5, 111)	(798.8, 291)	(2024.3, 737)	(5593.6, 2045)
EM-TV	(9.01, 5)	(–, –)	(–, –)	(–, –)	(–, –)	(–, –)	(–, –)

In addition to the CPU time and the number of complete iterations, we further estimate the number of arithmetic operations for the PAPA and the nested EM-TV algorithm. We list in table 3 the number of total arithmetic operations required by the PAPA and the nested EM-TV, under various stopping criteria.

Table 3. Comparison of arithmetic operations of semi-dynamic PAPA and the nested EM-TV for the test data. The number ( · ) represents ( · ) × 2²⁰ arithmetic operations required.

Low-noise phantom

tol Algorithm	10⁻¹	10⁻²	10⁻³	10⁻⁴	10⁻⁵	10⁻⁶	10⁻⁷
PAPA	3380	9464	29 744	83 148	207 532	400 192	1080 924
Nested EM-TV	3795	10 626	32 637	88 803	233 013	589 743	1653 861
High-noise phantom
PAPA	3380	9464	28 392	79 092	195 364	392 080	1176 916
Nested EM-TV	3795	10 626	31 878	84 249	220 869	559 383	1552 155

Furthermore, we compare the image quality metrics of the reconstructed images obtained by using the PAPA and by the nested EM-TV. The means of CVs for the reconstructions by the PAPA and the conventional EM-TV are reported in table 4 . The means of CNRs with respect to five different noise realizations versus the diameters of the hot spheres (resp. cold spheres) for the PAPA, the nested EM-TV and the conventional EM-TV are listed in table 5 (resp. 6). Moreover, the means of NMSEs with respect to five different noise realizations for the images of the low-noise phantom and the high-noise phantom are presented in table 7. We conclude from these tables that both the PAPA and the nested EM-TV outperform the conventional EM-TV in terms of the values of CV, CNR and NMSE, while the PAPA performs comparably with the nested EM-TV.

Table 4. Means of CVs for the reconstructions by the PAPA, the nested EM-TV and the conventional EM-TV.

Phantom	Low-noise phantom	High-noise phantom
PAPA	0.12%	4.09%
Nested EM-TV	0.14%	4.54%
EM-TV	3.81%	13.15%

Table 5. Means of CNRs for seven hot spheres reconstructed using the PAPA, the nested EM-TV and the conventional EM-TV.

Low-noise phantom

Diameters of spheres	14	9	7	6	5	4	3
PAPA	1915.768	1590.557	1574.134	1451.297	1280.732	1043.835	611.271
Nested EM-TV	1912.141	1587.496	1571.388	1448.454	1277.969	1041.962	610.677
EM-TV	65.801	55.159	56.717	51.872	47.284	40.451	28.124
High-noise phantom
PAPA	52.171	41.019	37.525	31.919	28.247	13.939	5.448
Nested EM-TV	52.149	41.002	37.510	31.905	28.234	13.932	5.445
EM-TV	17.007	13.116	11.885	9.815	8.389	4.237	2.232

Table 6. Means of CNRs for seven cold spheres reconstructed using the PAPA, the nested EM-TV and the conventional EM-TV.

Low-noise phantom

Diameters of spheres	14	9	7	6	5	4	3
PAPA	543.563	427.444	330.512	283.982	229.776	97.887	14.505
Nested EM-TV	542.524	426.538	330.087	283.587	229.380	97.331	14.437
EM-TV	18.721	15.195	12.599	11.744	10.610	6.785	3.993
High-noise phantom
PAPA	14.378	10.106	6.577	8.570	1.543	0.424	0.764
Nested EM-TV	14.373	10.105	6.576	8.568	1.542	0.424	0.762
EM-TV	4.929	3.544	2.484	3.128	1.310	0.724	0.839

Table 7. Means of NMSEs for the reconstructions by the PAPA, the nested EM-TV and the conventional EM-TV.

Phantom	Low-noise phantom	High-noise phantom
PAPA	0.484 69	2.547 01
Nested EM-TV	0.484 81	2.547 94
EM-TV	0.786 70	3.925 71

In order to qualitatively compare the image reconstruction quality, in figures 5 and 6, we present selected transaxial cross-sections through the reconstructed images of the hot and cold spheres, respectively. Images reconstructed by the PAPA, the nested EM-TV and the conventional EM-TV for two noise levels (56% and 15%) are shown. We observe much higher background noise in the images reconstructed by the conventional EM-TV, compared to the PAPA and the nested EM-TV. The quality of images reconstructed by the PAPA and the nested EM-TV is similar. The hot sphere with four-pixel radius cannot be detected in the images reconstructed from high-noise data by the conventional EM-TV. However, it is detectable in the reconstructions performed using the PAPA and the nested EM-TV. The cold spheres with six- and seven-pixel radii are poorly visible in the images reconstructed from high-noise data by the conventional EM-TV. However, they are easily detectable in the reconstructions performed using the PAPA and the nested EM-TV.

**Figure 5.** Transaxial cross-sections through the centers of the hot spheres (slice 17) in images reconstructed for low-noise phantom: (a) PAPA, (b) nested EM-TV and (c) conventional EM-TV. Transaxial cross-sections through the centers of the hot spheres (slice 17) in images reconstructed for high-noise phantom: (d) PAPA, (e) nested EM-TV and (f) conventional EM-TV. The calibration bars indicates the linear mapping between the reconstructed activity and the gray scale used.
Download figure:
Standard image

**Figure 6.** Transaxial cross-sections through the centers of the cold spheres (slice 49) in images reconstructed for low-noise phantom: (a) PAPA, (b) nested EM-TV and (c) conventional EM-TV. Transaxial cross-sections through the centers of the cold spheres (slice 49) in images reconstructed for high-noise phantom: (d) PAPA, (e) nested EM-TV and (f) conventional EM-TV. The calibration bars indicates the linear mapping between the reconstructed activity and the gray scale used.
Download figure:
Standard image

To better access the differences between images reconstructed using the PAPA and the nested EM-TV, we obtain line profiles through selected transaxial cross-sections of the reconstructed images containing the hot and cold spheres. They are shown in figures 7 and 8 for hot and cold spheres, respectively. We observe that for hot sphere reconstructions PAPA provides images with slightly better contrast and spatial resolution, compared to the nested EM-TV. For cold sphere reconstructions both algorithms perform similarly. The conventional EM-TV reconstructions are inferior in any case.

**Figure 7.** One-pixel wide line profiles through the centers of hot spheres with 7, 14 and 3-pixel radii in transaxial cross-sections (slice 17) of: (a) high-noise phantom, (b) image reconstructed using the PAPA from the high-noise phantom data, (c) image reconstructed using the nested EM-TV from the high-noise phantom data and (d) image reconstructed using conventional EM-TV from the high-noise phantom data.
Download figure:
Standard image

**Figure 8.** One-pixel wide line profiles through the centers of cold spheres with 7-, 14- and 3-pixel radii in transaxial cross-sections (slice 49) of (a) high-noise phantom, (b) image reconstructed using the PAPA from the high-noise phantom data, (c) image reconstructed using the nested EM-TV from the high-noise phantom data and (d) image reconstructed using conventional EM-TV from the high-noise phantom data.
Download figure:
Standard image

7. Concluding remarks

There is a great need to reduce radiation dose to patients undergoing ECT examinations. This could be accomplished by lowering the total amount of activity in the radiotracer administered. However, it would lead to very high Poisson noise in the raw ECT data. In turn, such very noisy data, if treated by conventional techniques, such as EM-TV or OSEM, would lead to very noisy and clinically unacceptable reconstructed images. To attain good quality ECT reconstructions from low-dose ECT examinations, we propose a PAPA for solving the MAP ECT reconstruction problem. We prove that the algorithm enjoys nice theoretical convergence results in the case that the preconditioner is fixed. Motivated by the classical EM algorithm, we propose dynamic and semi-dynamic EM-preconditioners for the PAPA to accelerate the convergence of the original iterative scheme, which is the main contribution of this work. We demonstrate in the numerical experiments that the EM-preconditioner converges fast to a fixed preconditioning matrix, which in turn confirms the applicability of the convergence result to the practical situation. Since the TV-based penalty function can well preserve the edges and details of the reconstructed object, we particularly concentrate on the example with TV regularization. Based on the numerical experiments performed in this work, we observe that the alternating projection algorithm with the EM-preconditioner significantly outperforms the conventional EM-TV in all aspects including the convergence speed, the noise in the reconstructed images and the image quality. It also outperforms the recently developed algorithm—the nested EM-TV—in the convergence speed while having comparable image quality.

We conclude that the developed alternating projection algorithm with dynamic or semi-dynamic EM-preconditioner might allow very significant reduction in the radiation dose to patients imaged using ECT by providing the same CNR for hot and cold lesions as the conventional EM-TV, but with the total administered radiotracer activity two to six times lower than presently used in ECT examinations reconstructed using the conventional EM-TV.

Acknowledgments

This work was supported in part by US Air Force Office of Scientific Research under grant FA9550-09-1-0511, by the US National Science Foundation under grants DMS-0712827, DMS-1115523, by the Natural Science Foundation of China under grants 11071286 and 91130009 and by Guangdong Provincial Government of China through the 'Computational Science Innovative Research Team' program. This work was also supported in part by award 5-28527 from the National Center for Research Resources (NCRR), a component of the National Institutes of Health (NIH) and the NIH Roadmap for Medical Research, and by the Center of Emerging and Innovative Sciences (CEIS), a NYSTAR-designated Center for Advanced Technology. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Center for Research Resources or the National Institutes of Health. This work was supported in part by Carol M Baldwin Breast Cancer Research Award.

Preconditioned alternating projection algorithms for maximum a posteriori ECT reconstruction

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction

2. Maximum a posteriori estimation for ECT reconstruction

3. Characterizations and algorithms

4. Convergence analysis

5. TV-regularized MAP ECT reconstruction

6. Numerical experiments

6.1. Simulated SPECT projection data

6.2. Numerical studies of the proposed algorithms

6.3. Comparison of the PAPA with the EM-TV algorithm

6.4. Comparison of the PAPA with the nested EM-TV algorithm

7. Concluding remarks

Acknowledgments

Preconditioned alternating projection algorithms for maximum a posteriori ECT reconstruction

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction

2. Maximum a posteriori estimation for ECT reconstruction

3. Characterizations and algorithms

4. Convergence analysis

5. TV-regularized MAP ECT reconstruction

6. Numerical experiments

6.1. Simulated SPECT projection data

6.2. Numerical studies of the proposed algorithms

6.3. Comparison of the PAPA with the EM-TV algorithm

6.4. Comparison of the PAPA with the nested EM-TV algorithm

7. Concluding remarks

Acknowledgments