Semi-convergence properties of Kaczmarz’s method

Tommy Elfving; Per Christian Hansen; Touraj Nikazad

doi:10.1088/0266-5611/30/5/055007

1. Introduction

Computed tomography and many other large-scale imaging problems lead to large linear systems of equations (often inconsistent) with noisy data, of the form:

$\begin{equation} A\, x \simeq b, \qquad b = \bar{b} + \delta b , \qquad A\in \mathbb {R}^{m \times n}, \end{equation} \tag{ 1 }$

where $\bar{b} = A \, \bar{x}$ denotes the exact data and δb is the perturbation consisting of additive noise. The matrix A comes from the discretization of an ill-posed linear problem, such as the Radon transform, and $\bar{x}$ denotes the exact image. The numerical solution of these systems calls for the use of iterative regularization methods, and it is often necessary to incorporate additional constraints to the reconstruction, e.g. non-negativity constraints.

We focus here on regularizing iterations with semi-convergence, where the iteration number plays the role of the regularizing parameter. In the early stages, the iteration vector approaches a regularized solution, while continuing the iteration leads to iteration vectors deteriorated by noise, cf [2, 3, 8, 16–18, 20, 22, 25]. Some of these papers also discuss the issue of finding a good value for the relaxation-parameter (also allowing non-stationary iteration). Recent papers dealing with the important problem of defining stopping criteria are [1] and [30].

Our goal is to study the regularizing properties of Kaczmarz's method, also known as the algebraic reconstruction technique (ART). To do this, we split the error in the iteration vectors into two components: an iteration error and a noise error—the latter due to the noise in the data—and we derive upper bounds for the noise error. The regularizing effects of this method have been discussed and investigated experimentally by several authors, e.g. [8, 34]. To our knowledge, however, a bound for the noise error has not been derived previously.

Our paper is organized as follows. Section 2 presents numerical results that demonstrate the regularizing properties of Kaczmarz's method and give some computational insight into its advantage over other algebraic iterative methods. Section 3 sets the stage for our analysis by presenting the semi-convergence theory for simultaneous iterative reconstruction methods. Our main results are then presented in section 4 where we derive an upper bound for the noise propagation in the (block) Kaczmarz method, both in its 'standard' form and with projections. We also extend the analysis to a more general class of sequential block iterations. We conclude with numerical examples in section 5. Throughout the paper, we use the following notation:

|| · || denotes the vector and matrix 2-norm.
By diag(z_i), where z_i can be either scalars or matrices, we mean the (block) diagonal matrix
$\begin{equation*} \mathrm{diag}(z_i) = \left(\begin{array}{@{}ccc@{}}z_1 & & \\ & z_2 & \\ & & \ddots \end{array} \right) . \end{equation*}$
With M a symmetric positive definite (SPD) matrix, $\Vert x\Vert _M = \sqrt{x^TMx}$ denotes a weighted Euclidean norm, and M^1/2 the square root of M.
With A a rectangular matrix, σ_i(A) denotes its ith singular value; they are ordered such that ||A|| = σ₁ ⩾ σ₂ ⩾ ... ⩾ σ_r with r = rank(A). Moreover, $\mathcal {R}(A)$ and $\mathcal {N}(A)$ denote the range and null space, respectively, of A.
With Q a square matrix, ρ(Q) is its spectral radius and λ_i(Q) denotes its ith eigenvalue; we recall that:
$\begin{equation} \sigma _i^2(A) = \lambda _i(A^TA) , \end{equation} \tag{ 2 }$
assuming the same ordering of the singular values and the eigenvalues.
$\mathcal {C}$ denotes a closed convex set and $P_{\mathcal {C}}\!$ is the metric projection onto this set.

2. Motivation

Kaczmarz's algorithm comes in many variants—see, e.g. [22, chapter 11] and [26, 31]. It is often observed that it converges faster than other algebraic iterative methods such as Landweber and Cimmino. To illustrate this we consider two specific algorithms. The first method is the projected Cimmino algorithm with the iterations :

$\begin{equation*} x_{k+1} = P_{\mathcal {C}} ( x_k + \omega A^T M (b-A x_k) ) , \quad k = 1,2,\ldots \end{equation*}$

where $M = M_{\mathrm{C}} = {\textstyle \frac{1}{m}}\mathrm{diag}(1/\Vert a_i \Vert ^2)$ , 0 < ω < 2||A^TM_C A||⁻¹, and $a_i^T$ denotes the ith row of A. The second method is the projected symmetric Kaczmarz algorithm with cyclic control where each cycle, starting with y_{k, 1} = x_k and j = 0, involves the updating iterations for i = 1, 2, ..., m, m − 1, m − 2, ..., 3, 2, 1

$\begin{equation*} \left\lbrace \begin{array}{@{}l}j = j+1 , \\ y_{k,j+1} = y_{k,j} + \omega \displaystyle \frac{b_i - a_i^T y_{k,j}}{\Vert a_i \Vert ^2} a_i , \end{array} \right. \end{equation*}$

where 0 < ω < 2. After this cycle we set $x_{k+1} = P_{\mathcal {C}}(y_{k,2m})$ . Note that one cycle of this algorithm involves twice as many flops as one iteration of the Cimmino algorithm. It is well known that the symmetric Kaczmarz algorithm can be written in the same form as the Cimmino algorithm with the SPD matrix M = M_S given by

$\begin{equation*} M_{\mathrm{S}} = (2-\omega ) (D + \omega L^T)^{-1} D (D + \omega L)^{-1} \end{equation*}$

based on the splitting AA^T = L + D + L^T with D diagonal and L strictly lower triangular. We choose these two methods for our example because they can both be analyzed by means of the singular value decomposition (SVD) [20]; but we emphasize that the superiority of Kaczmarz's algorithm and its variants—over Landweber, Cimmino and related methods—holds in general.

Figure 1 shows the relative errors $\Vert x_k - \bar{x} \Vert / \Vert \bar{x} \Vert$ versus k for both algorithms (with $\bar{x}$ denoting the exact solution), and with three different choices of $\mathcal {C}$ corresponding to no constraints ( $\mathcal {C}=\mathbb {R}^n$ ), non-negativity constraints ( $\mathcal {C} = \mathbb {R}_+^n$ ), and box constraints ( $\mathcal {C} = [0,1]^n$ ). The matrix is 15 232 × 40 000 and comes from discretization of a parallel-beam tomography problem (more details about this test problem are given in section 5). The right-hand side in (1) was perturbed with white Gaussian noise δb such that $\Vert \delta b \Vert / \Vert \bar{b} \Vert = 0.01$ . The starting vector was zero, and for both algorithms we used the optimal relaxation parameters (ω = 473 for Cimmino and ω = 1.05 for symmetric Kaczmarz) as found by AIR Tools [21]. The best reconstructions (marked by the circles in the error histories) are shown in figure 2. Both algorithms achieve approximately the same accuracy, but symmetric Kaczmarz converges much faster than Cimmino.

**Figure 2.** Exact image $\bar{x}$ and best solutions *x^k* obtained by the methods from figure 1. All solutions are shown as 200 × 200 images.
Download figure:
Standard image High-resolution image

**Figure 2.** Exact image $\bar{x}$ and best solutions *x^k* obtained by the methods from figure 1. All solutions are shown as 200 × 200 images.
Download figure:
Standard image High-resolution image

To obtain more insight into the performance of the two algorithms, we computed the singular values of A as well as of the two matrices $M_{\mathrm{C}}^{1/2}A$ and $M_{\mathrm{S}}^{1/2}A$ associated with Cimmino and symmetric Kaczmarz. For computational reasons we used a smaller matrix of size 7556 × 10 000. The results are shown in figure 3. For both A and Cimmino, we observe the 'standard' behavior where the singular values decay gradually to zero. For symmetric Kaczmarz, on the other hand, there is a large cluster of singular values close to 1, and the first 250 singular values are identical to 1 within rounding errors. The large cluster of singular values close to 1 means that we capture a lot of SVD components in a few cycles. This intuitively explains the fast convergence of symmetric Kaczmarz compared to Cimmino.

**Figure 3.** The singular values of A and those of the matrices $M_{\mathrm{C}}^{1/2}A$ and $M_{\mathrm{S}}^{1/2}A$ associated with Cimmino and symmetric Kaczmarz.
Download figure:
Standard image High-resolution image

An inspection of the right singular vectors—the basis vectors of the solutions—sheds more light on the appearance of the solutions, see figure 4. For the matrix A as well as $M_{\mathrm{C}}^{1/2}A$ for Cimmino we see the traditional behavior, namely, that the vectors provide a spectral basis where low-frequency information is associated with the largest singular values. For symmetric Kaczmarz, we see a very different behavior: all singular vectors of $M_{\mathrm{S}}^{1/2}A$ have low, medium, and high frequencies present in them. Here, the matrix $M_{\mathrm{S}}^{1/2}$ acts as a 'preconditioner' that changes the right singular vectors so that they do not form a spectral basis. The observation that high frequencies are present in all singular vectors explains intuitively why we can compute solutions with sharp edges (and other fine structures) even after a few cycles, because high frequencies are always present in the singular vectors.

3. Setting the stage: analysis of SIRT-type algorithms

Before studying the Kaczmarz algorithm and its projected version, in this section we consider algorithms related to the simultaneous iterative reconstruction technique (SIRT) methods. In particular we study their semi-convergence, and we extend some results from an earlier paper [16] on this subject. It should be noted that a semi-convergence analysis, cf [18, 20], consists of two parts: noise-error analysis and iteration-error analysis (i.e. convergence for exact data). In the following we will use 'convergence' to mean convergence for exact data (as is the usual convention).

3.1. Convergence properties

We consider the following two algorithms—an unconstrained algorithm and its constrained variant.

Choose arbitrary initial vectors $x_0,\hat{x}_0 \in \mathbb {R}^n$ and update the iteration vectors x_k and $\hat{x}_k$ by means of

$\begin{equation} {\scriptsize {\rm U-ALG}:} \quad x_{k+1} = x_k+\omega A^TM(b-A\, x_k) , \quad k=0,1,2,\ldots \end{equation} \tag{ 3 }$

$\begin{equation} {\scriptsize {\rm P-ALG}:} \quad \hat{x}_{k+1} = P_{\mathcal {C}} \!( \hat{x}_k+\omega A^TM(b-A\, \hat{x}_k)) , \quad k=0,1,2,\ldots \end{equation} \tag{ 4 }$

Here ω > 0 is the relaxation parameter, and we assume that M is nonsingular—notice that we do not require M to be SPD.

When M is also SPD, several well-known methods can be written in the form (3) for appropriate choices of the matrix M. With M equal to the identity we get the classical (projected) Landweber method, e.g. [18, 20]. Cimmino's method [10] is obtained with $M = M_{\mathrm{C}} = \frac{1}{m}\mathrm{diag}(1/\Vert a_i \Vert ^2)$ where a_i denotes the ith row of A. The component averaging method [9] uses $M = \mathrm{diag}(1/\sum _{j=1}^n N_j a_{ij}^2)$ where N_j is the number of non-zeroes in the jth column of A. For applications of the SIRT methods in imaging and related areas see [2, 3, 6, 35, 36].

For the case where M is SPD we define the following constrained weighted least-squares problem associated with the SIRT methods:

$\begin{equation} \min {\textstyle \frac{1}{2}}\Vert A \, x - b \Vert _M^2 \quad {\rm s.t.} \quad x\in {\mathcal {C}}, \qquad \textrm {where}\ M \ \textrm {is SPD.} \end{equation} \tag{ 5 }$

For the unconstrained problem ( ${\mathcal {C}}=\mathbb {R}^n$ ) the minimization problem in (5) always has a solution, whereas the constrained problem has a solution for any right-hand side b if and only if $A({\mathcal {C}})$ , the image set of ${\mathcal {C}}$ , is closed, cf [7, p 444], [27, p 442]. This is the case, e.g. when ${\mathcal {C}}$ is bounded or when the objective function is coercive, i.e. the null space of A is nonempty. Note that for this case the objective function is strictly convex and a unique solution exists. We have the following convergence result (see, e.g. [4]).

Theorem 1. Assume that the solution set of (5) is nonempty. Then the iterates of (3) (corresponding to $\mathcal {C}=\mathbb {R}^n$ in (5)) and (4) converge toward a solution of (5) if

$\begin{equation} 0 < \epsilon \le \omega \le (2-\epsilon )/\rho (A^TMA). \end{equation} \tag{ 6 }$

Convergence results for the case when M is not SPD (but nonsingular) can be found in [11, 14, 15, 25, 29] for the unconstrained case, and in [22, 23, 28] for the constrained case.

3.2. Semi-convergence properties

Although the above-mentioned convergence results are interesting and relevant, it is important to remember that for ill-conditioned problems—even for the finite-dimensional problems considered here—the solution set is usually completely deprived of any physical meaning due to the inverted noise. However, numerical results, e.g. [2, 8] show that the SIRT-type methods exhibit semi-convergence so that during the first steps the iterates improve, while later on, due to noise propagation, they start to deteriorate.

The semi-convergence properties when M is SPD were recently investigated in [17] for U-ALG and in [16] for P-ALG, see also [13]. Here we extend this analysis to the case when M is not SPD. Examples of such algorithms include cycles of the Kaczmarz method, and more generally, cycles of block iteration, cf [11, 15].

Let x_k and $\bar{x}_k$ denote the iterates of U-ALG using the noisy and the noise-free right-hand side b and $\bar{b}$ , respectively, and let $\bar{x}$ be a limit point of (3). Then we can write:

$\begin{equation} x_k - \bar{x} = x_k - \bar{x}_k + \bar{x}_k - \bar{x}, \end{equation} \tag{ 7 }$

i.e. the error decomposes into two components: the noise error $x_k - \bar{x}_k$ and the iteration error $\bar{x}_k - \bar{x}$ (in [18, p 157], where Landweber iteration is considered, they are called 'approximation error' and 'data error'). It is the interplay between these two error-terms that explains the semi-convergence of the method. Classical convergence theory (e.g. theorem 1 above) focuses on the iteration error, and then it is normally assumed that the noise error is negligible. As mentioned above, this is not the case for ill-posed and noisy problems.

To study the noise error we first introduce some notation:

$\begin{equation} e_k^{\rm N} = x_k-\bar{x}_k, \quad B = A^T M A, \quad Q = I - \omega B, \quad \delta =\Vert A^TM\delta b\Vert . \end{equation} \tag{ 8 }$

Further let

$\begin{equation} \widetilde{Q} = Q P, \quad q=\Vert \widetilde{Q}\Vert , \quad P = P_{{\mathcal {R}}(A^T)} , \end{equation} \tag{ 9 }$

where $P_{{\mathcal {R}}(A^T)}$ denotes the orthogonal projection onto the range of A^T. We assume that

$\begin{equation} \rho (\widetilde{Q}) \le q <1, \end{equation} \tag{ 10 }$

and we return to this assumption in section 4.1. By (3) it follows that the noise error $e_k^{\rm N}$ , defined in (8), satisfies

$\begin{equation*} e_k^{\rm N} = Qe_{k-1}^{\rm N}+\omega A^TM\delta b, \end{equation*}$

and so it follows, assuming $e_0^{\rm N} = x_0 - \bar{x}_0 = 0$ , that

$\begin{equation} e_k^{\rm N}=\omega \sum _{j=0}^{k-1}Q^jA^TM\delta b . \end{equation} \tag{ 11 }$

Let u = A^TMδb; now

$\begin{equation*} Q^2 u = Q^2 P u = (I-\omega B)(I-\omega B)Pu = (I-\omega B)P(I-\omega B)Pu = \widetilde{Q}^2 u , \end{equation*}$

and it follows by induction that

$\begin{equation} e_k^{\rm N}= \omega \sum _{j=0}^{k-1}\widetilde{Q}^jA^TM\delta b. \end{equation} \tag{ 12 }$

If we define the quantities $\widetilde{E}_k$ and E_k by

$\begin{equation} \widetilde{E}_k \equiv \omega \delta \left\Vert \sum ^{k-1}_{j=0}\widetilde{Q}^j \right\Vert , \qquad E_k \equiv \omega \delta \sum ^{k-1}_{j=0}q^j = \omega \delta \frac{1-q^k}{1-q} , \end{equation} \tag{ 13 }$

then they are simple upper bounds for the noise error:

$\begin{equation} \Vert e_k^{\rm N} \Vert \le \widetilde{E}_k \le E_k . \end{equation} \tag{ 14 }$

These inequalities show how the results from [16] carry over to the case when M is not SPD.

3.3. Bounds when M is SPD

Assuming that M is SPD and ω ∈ (0, ||M^1/2A||⁻²), the following bound was derived for the unconstrained case in [17, equation (3.4)]:

$\begin{equation} \Vert e_k^{\rm N} \Vert \le \sqrt{\omega } \frac{1-\zeta _k^k}{\sqrt{1-\zeta _k}} \Vert M^{1/2} \delta b \Vert . \end{equation} \tag{ 15 }$

Here ζ_k are roots of a certain polynomial (they satisfy 0 < ζ_k < ζ_{k + 1} and lim_{k → ∞}ζ_k = 1). Since

$\begin{equation*} \frac{(1-\zeta _k^k)^2}{1-\zeta _k} < \frac{1-\zeta _k^k}{1-\zeta _k} = 1 + \zeta _k+\zeta _k^2+\dots +\zeta _k^{k-1} < k \end{equation*}$

we find that

$\begin{equation} \Vert e_k^{\rm N} \Vert \le \Vert M^{1/2} \delta b \Vert \sqrt{\omega } \, \sqrt{k}. \end{equation} \tag{ 16 }$

In [11, lemma 6.2], assuming M = I and ω = 1, the bound $\Vert \delta b \Vert \, \sqrt{k}$ is also derived. The constrained case when M is SPD was considered by Eicke [12] and in [16, equation (3)] where it is shown (assuming $\mathcal {N}(A)=\lbrace 0 \rbrace$ ) that

$\begin{equation} e_k^{\rm N} \le \Vert M^{1/2} A \Vert \, \omega k + O(\sigma _n^2) , \end{equation} \tag{ 17 }$

where σ_n is the smallest singular value of M^1/2A.

4. Noise-error analysis of Kaczmarz's algorithm and its extensions

We now turn to the core of this paper, namely, the noise-error analysis of Kaczmarz's algorithm. For the sake of generality, we formulate our analysis for the block version (which includes the classical algorithm when each block is a single row). Let A and b be partitioned into p disjoint row-blocks {A_i} and {bⁱ}, respectively. We introduce the matrices

$\begin{equation*} M_i = (A_i A_i^T)^{+}, \quad i=1,2, \dots ,p, \end{equation*}$

where '+' denotes the 1,3–inverse; see [14] and [32] for definition and implementation details. Then the block Kaczmarz algorithm takes the form:

Block Kaczmarz Algorithm

Initialization: $x_0\in \mathbb {R}^n$ is arbitrary.

Iterative steps: For k = 1, 2, ... and given x_k perform the cycle

$\begin{eqnarray} &&x^{k,0}= x_k,\nonumber \\ &&x^{k,i}=x^{k,i-1}+\omega A_i^T M_i (b^i - A_i x^{k,i-1}), \quad i = 1,2,\dots ,p, \\ &&x_{k+1}=x^{k,p}. \nonumber \end{eqnarray} \tag{ 18 }$

When p = m (each block consists of a single row) then M_i is the scalar M_i = 1/||a_i||². We define one cycle as one pass through all rows.

Kaczmarz's method has a long history and rich literature. Originally it was proposed (for p = m) by Kaczmarz [24] and, independently, for use in image reconstruction by Gordon, Bender and Herman [19], see also [22]. An interesting recent development is given in [26, 31]. The block-case appears in [11, 14, 25]. Kaczmarz's method is sometimes also referred to as ART; but it should be noted that the name ART frequently denotes algebraic reconstruction methods in general.

4.1. Noise-error analysis for the unconstrained case

In order to perform a noise-error analysis of ART, let

$\begin{equation*} A A^T = L+D+L^T, \quad M_{\mathrm{a}}= (D+\omega L)^{-1} , \end{equation*}$

where L is strictly block-lower triangular and $D=\mathrm{diag}(A_iA_i^T)$ . Note that M_a is nonsingular but for ω ≠ 0 not symmetric. One cycle, cf [5, 25], can then be written

$\begin{equation} x_{k+1} = x_k + \omega A^T M_{\mathrm{a}}(b - A\, x_k) . \end{equation} \tag{ 19 }$

We introduce new notation following (8):

$\begin{equation} B_{\mathrm{a}}= A^T M_{\mathrm{a}}A, \quad Q_{\mathrm{a}}= I - \omega B_{\mathrm{a}}, \quad \delta _{\mathrm{a}} = \Vert A^T M_{\mathrm{a}}\delta b \Vert , \end{equation} \tag{ 20 }$

and

$\begin{equation} \widetilde{Q}_{\mathrm{a}}= Q_{\mathrm{a}}P, \quad q_{\mathrm{a}} = \Vert \widetilde{Q}_{\mathrm{a}}\Vert . \end{equation} \tag{ 21 }$

Assumption (10) now is fulfilled due to (25), see, e.g. [14, theorem 10] and [33].

Lemma 1. Let $\bar{\omega }=\omega (2-\omega )$ and R_a = D^1/2M_aA; then

$\begin{equation*} \sigma _i^2(\widetilde{Q}_{\mathrm{a}}) = \lambda _i(P(I-\bar{\omega } R_{\mathrm{a}}^T R_{\mathrm{a}})P) . \end{equation*}$

Proof. We will show that

$\begin{equation} Q_{\mathrm{a}}^T Q_{\mathrm{a}}= I - \bar{\omega } R_{\mathrm{a}}^T R_{\mathrm{a}}, \end{equation} \tag{ 22 }$

whence the result follows using (2) with $X=\widetilde{Q}_{\mathrm{a}}$ , i.e.

$\begin{equation*} \sigma _i^2(\widetilde{Q}_{\mathrm{a}}) = \sigma _i^2(Q_{\mathrm{a}}P) = \lambda _i(P Q_{\mathrm{a}}^T Q_{\mathrm{a}}P) \end{equation*}$

(note that P^T = P). Consider the expansion :

$\begin{equation} Q_{\mathrm{a}}^T Q_{\mathrm{a}}= (I - \omega B_{\mathrm{a}})^T (I-\omega B_{\mathrm{a}}) = I - \omega A^T (M_{\mathrm{a}}+ M_{\mathrm{a}}^T)A + \omega ^2 B_{\mathrm{a}}^T B_{\mathrm{a}}. \end{equation} \tag{ 23 }$

Now:

$\begin{eqnarray} B_{\mathrm{a}}^T B_{\mathrm{a}}& = & A^T M_{\mathrm{a}}^T A A^T M_{\mathrm{a}}A \nonumber \\ &=&A^T M_{\mathrm{a}}^T (D + \omega L^T + D + \omega L + (1-\omega )(L+L^T)-D) M_{\mathrm{a}}A \nonumber \\ &=& A^T (M_{\mathrm{a}}+ M_{\mathrm{a}}^T) A + (1-\omega ) A^T M_{\mathrm{a}}^T (L+L^T) M_{\mathrm{a}}A - A^T M_{\mathrm{a}}^T D M_{\mathrm{a}}A . \end{eqnarray} \tag{ 24 }$

Then using (24) and (23), we get :

$\begin{equation*} Q_{\mathrm{a}}^T Q_{\mathrm{a}}= I - A^T M_{\mathrm{a}}^T S M_{\mathrm{a}}A, \end{equation*}$

where we defined:

$\begin{equation*} S = \omega (1-\omega )(M_{\mathrm{a}}^{-1}+M_{\mathrm{a}}^{-T} - \omega (L+L^T))+\omega ^2 D . \end{equation*}$

Using $M_{\mathrm{a}}^{-1} = D+\omega L$ and $M_{\mathrm{a}}^{-T} = D+\omega L^T$ we get

$\begin{equation*} S = 2\omega (1-\omega )D + \omega ^2D = \omega (2-\omega )D, \end{equation*}$

so that $Q_{\mathrm{a}}^T Q_{\mathrm{a}}= I-A^T M_{\mathrm{a}}S M_{\mathrm{a}}A = I-\bar{\omega } R_{\mathrm{a}}^T R_{\mathrm{a}}$ proving (22) and the lemma. □

Lemma 2. Assume that

$\begin{equation} 0 < \epsilon \le \omega \le 2-\epsilon . \end{equation} \tag{ 25 }$

Then

$\begin{equation} q_{\mathrm{a}}^2=1-\bar{\omega }\sigma _r^2(R_{\mathrm{a}}). \end{equation} \tag{ 26 }$

Proof. Let $V=I-\bar{\omega } \, R_{\mathrm{a}}^T R_{\mathrm{a}}$ . By (2) and lemma 1

$\begin{equation*} q_{\mathrm{a}}^2 = \Vert \widetilde{Q}_{\mathrm{a}}\Vert ^2 = \sigma _1^2(\widetilde{Q}_{\mathrm{a}}) = \lambda _{\max }(P VP). \end{equation*}$

The eigenvectors of V are orthogonal so they must either belong to $\mathcal {N}(A)$ or $\mathcal {R}(A^T)$ . Let γ be any eigenvalue of V such that its eigenvector u belongs to $\mathcal {R}(A^T)$ . Then from Vu = γu follows P VPu = γu. On the other hand, if $u \in \mathcal {N}(A)$ is any eigenvector of V in the null space of A then Vu = u and P VPu = 0. We conclude that V and P VP have the same eigenvalues except those corresponding to eigenvectors in $\mathcal {N}(A)$ . Since $\bar{\omega } > 0$ (due to assumption (25)) it follows, also using (2), that $\lambda _{\max }(P\, VP) = 1-\bar{\omega }\sigma _r^2(R_{\mathrm{a}})$ . □

We can now formulate the central theorem of our paper on the propagation of the noise error in the block Kaczmarz algorithm.

Theorem 2. Let

$\begin{equation} \Psi _k(\sigma ,\omega ) \equiv \frac{1-(1-\omega \sigma ^2)^k}{\sigma }. \end{equation} \tag{ 27 }$

Assuming (25) the noise error in the block Kaczmarz algorithm is bounded above by:

$\begin{equation} \Vert e_k^{\rm N} \Vert \le E_{\mathrm{a},k} = \frac{\delta _{\mathrm{a}}}{\sigma _r} \, \Psi _k(\sigma _r,\omega ) + O(\sigma _r^2) , \end{equation} \tag{ 28 }$

where σ_r is the smallest nonzero singular value of R_a = D^1/2M_aA.

Proof. We first note that :

$\begin{equation*} q_{\mathrm{a}} = \sqrt{1-\bar{\omega }\sigma _r^2} = 1-{\textstyle \frac{1}{2}}\bar{\omega } \sigma _r^2+O(\sigma _r^4) . \end{equation*}$

Using (13) and Taylor expansion we get

$\begin{eqnarray*} E_{\mathrm{a},k} & = & \omega \delta _{\mathrm{a}} \frac{1-q_{\mathrm{a}}^k}{1-q_{\mathrm{a}}} = \omega \delta _{\mathrm{a}} \frac{1-(1-{\textstyle \frac{1}{2}}\bar{\omega }\sigma _r^2+O(\sigma _r^4))^k}{{\textstyle \frac{1}{2}}\bar{\omega } \sigma _r^2 + O(\sigma _r^4)} \\ & = & \omega \delta _{\mathrm{a}} \frac{1-(1-{\textstyle \frac{1}{2}}k \bar{\omega }\sigma _r^2+O(\sigma _r^4))}{{\textstyle \frac{1}{2}}\bar{\omega } \sigma _r^2 + O(\sigma _r^4)} = \omega \delta _{\mathrm{a}} k + O(\sigma _r^2) . \end{eqnarray*}$

Further, again using Taylor expansion,

$\begin{equation*} \Psi _k(\sigma ,\omega ) = \frac{1-(1-\omega \sigma ^2)^k}{\sigma } = \omega \sigma k + O(\sigma ^3) . \end{equation*}$

Combining these two error bounds the result follows. □

Using the inequality $(1 - (1-x^2)^k)/x \le \sqrt{k}$ for x ∈ (0, 1) with $x=\sqrt{\omega }\sigma _r$ the following bound for E_{a, k} emerges (where $\sigma _r < 1/\sqrt{\omega }$ ):

$\begin{equation} E_{\mathrm{a},k} \le \frac{\sqrt{\omega } \,\delta _{\mathrm{a}}}{\sigma _r} \sqrt{k} + O(\sigma _r^2) . \end{equation} \tag{ 29 }$

Note that this bound includes a factor $\sqrt{k}$ similar to the bound for SIRT in (16). The factor $\sqrt{\omega } \delta _{\mathrm{a}}/\sigma _r$ is unrealistically large, but we are not able to give a sharper bound. In the appendix we provide an alternative analysis in terms of eigenvalues; there we show that such an analysis is not guaranteed to sharpen the bound.

4.2. Extension to general block iteration

In [11] a family of so-called relaxation matrices $\lbrace M_{i,k} \rbrace _{i=1}^p$ are introduced and used in (18), thereby extending the block Kaczmarz algorithm. Convergence conditions are given for bounded relaxation matrices. Following [15] we will, however, assume that the family $\lbrace M_i \rbrace _{i=1}^p$ are all SPD and also not depending on k.

The examples given in the previous section all have their block-iterative counterparts in which, unlike block Kaczmarz, the matrices M_i are diagonal. With p = 1 there is just one block, and the method becomes a fully simultaneous iteration. The following result from [15] shows how the cycles of block iteration can be written as in algorithm U-ALG:

$\begin{equation} x_{k+1} = x_{k} + \omega A^T M_{\mathrm{b}}(b-Ax_{k}), \end{equation} \tag{ 30 }$

where

$\begin{equation} M_{\mathrm{b}}= (\widehat{D}+ \omega L)^{-1} \quad \textrm {and} \quad \widehat{D} = \mathrm{diag}(M_i^{-1}) . \end{equation} \tag{ 31 }$

Convergence conditions are given in [11], [15, proposition 6], [25]:

Theorem 3. The sequence {x_k} in (30) converges to a solution of A^TM_b(b − Ax) = 0 provided that 0 < epsilon ⩽ ω ⩽ W where

$\begin{equation} W = \frac{2-\epsilon }{\max _i \rho (A_i^T M_i A_i)} . \end{equation} \tag{ 32 }$

Since $\rho (A_i M_i A_i^T) = \rho (M_i A_i A_i^T)=1$ with $M_i = (A_i A_i^T)^+$ , the usual bound ω ∈ (0, 2) is retrieved for ART.

Let Q_b = I − ωA^TM_bA. Elementary calculations, similar to those leading to (22) yield

$\begin{equation*} Q_{\mathrm{b}}^T Q_{\mathrm{b}}= I - \omega A^T M_{\mathrm{b}}^T \bar{D} M_{\mathrm{b}}A, \quad \bar{D} = 2\widehat{D}-\omega D. \end{equation*}$

Hence lemma 1, with $\widetilde{Q}_{\mathrm{b}}= Q_{\mathrm{b}}P$ , takes the form

$\begin{equation*} \sigma _i^2(\widetilde{Q}_{\mathrm{b}}) = \lambda _i(P(I-\omega A^T M_{\mathrm{b}}^T \bar{D} M_{\mathrm{b}}A)P). \end{equation*}$

In order to use the previous analysis we must investigate when $\bar{D}$ is SPD so that we can write

$\begin{equation*} A^T M_{\mathrm{b}}^T \bar{D} M_{\mathrm{b}}A = \bar{R}^T\bar{R}, \quad \bar{R} = \bar{D}^{1/2} M_{\mathrm{b}}A. \end{equation*}$

Lemma 3. The matrix $\bar{D}=2\widehat{D}-\omega D$ is SPD provided that

$\begin{equation} 0 < \epsilon \le \omega \le W . \end{equation} \tag{ 33 }$

Proof. The matrix $\bar{D}=2\widehat{D}-\omega D=\mathrm{diag}(2M_i^{-1}-\omega A_i A_i^T)$ is SPD if all matrices $2M_i^{-1}-\omega A_i A_i^T$ , i = 1, 2, ..., p are SPD. Now

$\begin{equation*} 2M_i^{-1} - \omega A_i A_i^T = M_i^{-1/2} \big( 2I - \omega (M_i^{1/2} A_i) (M_i^{1/2} A_i)^T \big) M_i^{-1/2} , \end{equation*}$

which is SPD if $2 - \omega \sigma _1^2 (M_i^{1/2} A_i) > 0$ or $\omega < 2/\rho (A_i^T M_i A_i)$ . So if $\omega < W = 2/\max _i \rho (A_i^T M_i A_i)$ then $\bar{D}$ is SPD. □

Similar to lemma 2, we get $\Vert \widetilde{Q}_{\mathrm{b}}\Vert = 1-\bar{\omega }\sigma _r(\bar{R})^2$ . Therefore theorem 2 also holds for block iteration but with R replaced by $\bar{R}$ and the upper bound on ω replaced by W.

4.3. Noise-error analysis for the projected case

In order to treat the projected block Kaczmarz algorithm also, we return to P-ALG. Let $\hat{x}_k$ and $\hat{\bar{x}}_k$ denote the iterates of P-ALG using the noisy and the noise-free right-hand side, respectively. If we define the noise error $\hat{e}_k^{\rm N}=\hat{x}_k - \hat{\bar{x}}_k$ we get (with the same notation as in section 3.2), using the non-expansivity property of the metric projection, and assuming that $\hat{e}_0^{\rm N}=0$ ,

$\begin{equation} \Vert \hat{e}_k^{\rm N}\Vert \le \Vert Q\hat{e}^{\rm N}_{k-1}\Vert + \omega \delta \le \omega \delta \sum _0^{k-1}\Vert Q\Vert ^j. \end{equation} \tag{ 34 }$

Now if we assume that $\mathcal {N}(A)=\lbrace 0 \rbrace$ then $Q=\widetilde{Q}$ , and we get

$\begin{equation} \Vert \hat{e}_k^{\rm N}\Vert \le \omega \delta \sum _{j=0}^{k-1}q^j = \omega \delta \frac{1-q^k}{1-q}, \end{equation} \tag{ 35 }$

which is the same expression (13) as in the unconstrained case. Hence we find that theorem 2 with q = q_a in this case also holds for the projected block Kaczmarz method.

We next consider the case $\mathcal {N}(A) \ne \lbrace 0 \rbrace$ . If the iterates $\lbrace \hat{x}_k\rbrace$ all remain in ${\mathcal {R}}(A^T)$ then again $Q=\widetilde{Q}$ so that theorem 2 holds. We consider this case in more detail as follows. The iteration (4) can be written:

$\begin{equation} \hat{x}_{k+1} = P_{\mathcal {C}} ( \hat{y}_k), \quad \hat{y}_k= Q\hat{x}_k+\omega A^T M b, \quad k=0,1,2,\ldots \end{equation} \tag{ 36 }$

Note that the projections are performed after each cycle (and not after each iteration).

Lemma 4. Let $\bar{\hat{x}}_0=\hat{x}_0 \in \mathcal {R}(A^T)$ , and assume we have the property:

$\begin{equation} y \in \mathcal {R}(A^T) \ \Rightarrow \ P_{\mathcal {C}}(y) \in \mathcal {R}(A^T). \end{equation} \tag{ 37 }$

Then $\hat{e}_k^{\rm N} \in \mathcal {R}(A^T)$ , ∀k ⩾ 0.

Proof. From the assumption it follows that $\hat{y}_0 \in \mathcal {R}(A^T)$ and hence $\hat{x}_1 \in \mathcal {R}(A^T)$ . By induction it follows that $\hat{x}_k \in \mathcal {R}(A^T)$ , ∀k ⩾ 0 and, similarly, $\hat{\bar{x}}_k \in \mathcal {R}(A^T)$ , ∀k ⩾ 0. □

Condition (37) may be hard to verify. An alternative approach for the case $\mathcal {N}(A) \ne \lbrace 0 \rbrace$ is to consider a regularized problem having full rank; this was done in [16] where the constrained case was treated for M being SPD.

For block iteration using a family of SPD matrices $\lbrace M_i \rbrace _{i=1}^p$ one can of course also define corresponding constrained versions and repeat the arguments above.

5. A numerical example

The purpose of this section is to investigate how the bound in theorem 2 for the noise error in Kaczmarz's method compares with the real noise error. In particular, we will use the upper bound in equation (29), and we will consider the fully sequential version of the algorithm (with p = m) with no constraints as well as with box constraints. We also include the symmetric Kaczmarz method in our test, cf section 2. Both methods use cyclic control.

We use an example from tomographic image reconstruction, implemented in the function paralleltomo in the MATLAB package AIR Tools [21], which generates a sparse matrix A, an exact solution $\bar{x}$ (which represents the Shepp–Logan phantom) and an exact right-hand side $\bar{b} = A\, \bar{x}$ that represents the noise-free data. This test problem models a standard 2D parallel-beam tomography problem. This is the same problem that was used in section 2.

In our case the matrix dimensions are m = 15 232 and n = 40 000 corresponding to a phantom of size 200 × 200, and using 60 projection angles θ = 3, 6, 9, ..., 180°. We added Gaussian white noise δb to the exact data scaled such that $\eta = \Vert \delta b \Vert / \Vert \bar{b} \Vert$ equals 0.01 and 0.05, and we used ω = 1.05 (which is close to optimal as computed by means of the function trainLambdaART in AIR Tools).

The results are shown in figure 5 where we see, as expected, that the noise error increases with the number of iterations k. The upper bound in (29) is proportional to $\sqrt{k}$ , which is also plotted for reference, and we see that the actual errors indeed, after 20–30 iterations, exhibit a $\sqrt{k}$ -dependence. Hence, this experiment confirms the overall behavior of the bound. Figure 5 also shows the corresponding iteration errors; they decrease as expected, and the errors for the projected methods are smaller.

**Figure 5.** Left: the noise error versus k for Kaczmarz and symmetric Kaczmarz, with no constraints and with box constraints, together with $\sqrt{k}$ which tracks the behavior of these errors. Right: the corresponding iteration error versus k. We used two different noise levels η = 0.01 (top) and η = 0.05 (bottom).
Download figure:
Standard image High-resolution image

The bound (29) includes a factor $(\sqrt{\omega }\delta _{\mathrm{a}})/\sigma _r$ . For this problem we have ω = 1.05 and δ_a = ||A^TM_aδb|| = 1.2, but we cannot compute σ_r, the smallest nonzero singular value of R_a = D^1/2M_aA, because of the large size of this matrix. Via computations with problems of smaller dimensions, we obtain the extrapolated estimate σ_r = O(10⁻⁷) and hence we can estimate the factor $(\sqrt{\omega }\delta _{\mathrm{a}})/\sigma _r = O(10^7)$ which clearly gives an extremely large over-estimate of the noise error. This is due to the repeated use of norm inequalities in the derivation of the bound.

Our main observation is that while the specific upper bound in (29) gives too large an estimate for the noise error, it still tracks the correct behavior of the error via the factor $\sqrt{k}$ . It remains an open problem to sharpen the bound in theorem 2.

6. Conclusion

By numerical examples we motivate the use of Kaczmarz's method—a sequential iteration method—over the simultaneous iteration methods. We then present a noise-error analysis of the Kaczmarz method. By splitting the error into two components, the noise error and the iteration error, we provide theoretical insight into the semi-convergence of this method that can justify the well-known behavior of Kaczmarz's method. We also give results for the projected Kaczmarz method and show that it exhibits the same qualitative behavior. Finally, we provide numerical examples which illustrate that while our bounds include a very unrealistic and pessimistic factor, they perfectly track the behavior of the noise error for the methods considered.

Acknowledgments

This work is supported by Advanced grant no. 291405 'High-Definition Tomography' from the European Research Council. We thank the referees for constructive comments that helped to improve the presentation.

Appendix.: Noise error analysis based on eigenvalues

In the paper our analysis of the noise error is based on the SVD. This appendix presents an alternative approach using the eigenvalues of the matrix $\widetilde{Q}=(I-\omega A^TMA)P_{R(A^T)}$ defined in (8)–(9). Let $\lbrace v^i,\lambda _i\rbrace _{i=1}^n$ be the eigenpairs of $\widetilde{Q}$ with |λ₁| ⩾ |λ₂| ⩾ ⋅⋅⋅ ⩾ |λ_n|. Here we assume that $\widetilde{Q}$ has a basis of eigenvectors, i.e. it is nondefective.

A matrix Q is normal if Q^TQ = QQ^T, and such a matrix is always nondefective and its singular values are σ_i(Q) = |λ_i(Q)|. By direct calculation, one easily verifies that $\widetilde{Q}$ is not normal. However, since not all non-normal matrices are defective (but all defective matrices are non-normal), $\tilde{Q}$ may still be nondefective. Note that the concept of defectiveness implies not only the lack of an eigenvector basis but also that the matrix powers no longer only depend on eigenvalues. So while $Q^n v_i=\lambda _i^nv_i$ for a nondefective Q, the right-hand-side will be a polynomial in n times eigenvalue powers when Q is defective.

With our assumption we can write:

$\begin{equation} A^TM\delta b=\sum ^n_1\gamma _iv^i , \end{equation} \tag{ A.1 }$

and using (12) we get

$\begin{equation*} e_k^{\rm N} = \omega \sum _{j=0}^{k-1}\tilde{Q}^j \cdot \sum _{i=1}^n\gamma _iv^i = \omega \sum _{j=0}^{k-1} \sum _{i=1}^n \gamma _i\lambda _i^jv^i. \end{equation*}$

By taking norms we get

$\begin{equation*} \Vert e_k^{\rm N}\Vert = \omega \left\Vert \sum _{j=0}^{k-1} \sum _{i=1}^n \gamma _i\lambda _i^jv^i \right\Vert \le \omega \sum _{j=0}^{k-1}| \lambda _1^j| \cdot \sum _{i=1}^n \Vert \gamma _i v^i\Vert . \end{equation*}$

By the convergence results for ART we have |λ_i| < 1 when ω ∈ (0, 2). Hence $\sum _{j=0}^{k-1}| \lambda _1^j| = (1-| \lambda _1|^k)/(1-|\lambda _1|)$ . By putting |λ₁| = 1 − x² we find (using that x ∈ (0, 1)) $\sum _{j=0}^{k-1}| \lambda _1^j|=(1-(1-x^2)^k)/x^2 \le \sqrt{k}/x$ . It follows that

$\begin{equation} \Vert e_k^{\rm N}\Vert \le \frac{\omega \sqrt{k}}{\sqrt{1-|\lambda _1|}} \cdot \sum _{i=1}^n \Vert \gamma _i v^i\Vert . \end{equation} \tag{ A.2 }$

We next consider two special cases.

γ₁ ≠ 0 and γ_i = 0 for i = 2, 3, ..., n. Then A^TMδb = γ₁v¹. Hence by (A.2)
$\begin{equation} \Vert e_k^{\rm N}\Vert \le \frac{\omega \sqrt{k}}{\sqrt{1-|\lambda _1|}} \cdot \Vert A^TM\delta b\Vert . \end{equation} \tag{ A.3 }$
γ_i = 0, i = 1, 2, ..., n − 1 and γ_n ≠ 0. Then one easily finds:
$\begin{equation} \Vert e_k^{\rm N}\Vert = \omega \Vert \gamma _n v^n\Vert \cdot \left| \sum _{j=0}^{k-1}\lambda _n^j \right| = \omega \Vert A^TM\delta b\Vert \cdot \left| \sum _{j=0}^{k-1}\lambda _n^j \right| . \end{equation} \tag{ A.4 }$
We see that ω||A^TMδb|| is a lower bound for the noise error.

We conclude that the eigenvalue analysis does not provide a preferable alternative to the SVD analysis. On the theoretical side, it is not clear that $\widetilde{Q}$ has a basis of eigenvectors, and it is difficult to compare our bound (29) with the above upper bounds since there is no simple relation between the singular values of D^1/2M_aA and the eigenvalues of $\widetilde{Q}$ . On the practical side, it is computationally demanding to estimate both σ_r and λ₁.

Semi-convergence properties of Kaczmarz's method

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. Motivation

3. Setting the stage: analysis of SIRT-type algorithms

3.1. Convergence properties

3.2. Semi-convergence properties

3.3. Bounds when M is SPD

4. Noise-error analysis of Kaczmarz's algorithm and its extensions

4.1. Noise-error analysis for the unconstrained case

4.2. Extension to general block iteration

4.3. Noise-error analysis for the projected case

5. A numerical example

6. Conclusion

Acknowledgments

Appendix.: Noise error analysis based on eigenvalues

Semi-convergence properties of Kaczmarz's method

Article metrics

Submit

Permissions

Share this article

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. Motivation

3. Setting the stage: analysis of SIRT-type algorithms

3.1. Convergence properties

3.2. Semi-convergence properties

3.3. Bounds when M is SPD

4. Noise-error analysis of Kaczmarz's algorithm and its extensions

4.1. Noise-error analysis for the unconstrained case

4.2. Extension to general block iteration

4.3. Noise-error analysis for the projected case

5. A numerical example

6. Conclusion

Acknowledgments

Appendix.: Noise error analysis based on eigenvalues