Brought to you by:
Paper

Semi-convergence properties of Kaczmarz's method

, and

Published 1 May 2014 © 2014 IOP Publishing Ltd
, , Citation Tommy Elfving et al 2014 Inverse Problems 30 055007 DOI 10.1088/0266-5611/30/5/055007

0266-5611/30/5/055007

Abstract

Kaczmarz's method—sometimes referred to as the algebraic reconstruction technique—is an iterative method that is widely used in tomographic imaging due to its favorable semi-convergence properties. Specifically, when applied to a problem with noisy data, during the early iterations it converges very quickly toward a good approximation of the exact solution, and thus produces a regularized solution. While this property is generally accepted and utilized, there is surprisingly little theoretical justification for it. The purpose of this paper is to present insight into the semi-convergence of Kaczmarz's method as well as its projected counterpart (and their block versions). To do this we study how the data errors propagate into the iteration vectors and we derive upper bounds for this noise propagation. Our bounds are compared with numerical results obtained from tomographic imaging.

Export citation and abstract BibTeX RIS

1. Introduction

Computed tomography and many other large-scale imaging problems lead to large linear systems of equations (often inconsistent) with noisy data, of the form:

Equation (1)

where $\bar{b} = A \, \bar{x}$ denotes the exact data and δb is the perturbation consisting of additive noise. The matrix A comes from the discretization of an ill-posed linear problem, such as the Radon transform, and $\bar{x}$ denotes the exact image. The numerical solution of these systems calls for the use of iterative regularization methods, and it is often necessary to incorporate additional constraints to the reconstruction, e.g. non-negativity constraints.

We focus here on regularizing iterations with semi-convergence, where the iteration number plays the role of the regularizing parameter. In the early stages, the iteration vector approaches a regularized solution, while continuing the iteration leads to iteration vectors deteriorated by noise, cf [2, 3, 8, 1618, 20, 22, 25]. Some of these papers also discuss the issue of finding a good value for the relaxation-parameter (also allowing non-stationary iteration). Recent papers dealing with the important problem of defining stopping criteria are [1] and [30].

Our goal is to study the regularizing properties of Kaczmarz's method, also known as the algebraic reconstruction technique (ART). To do this, we split the error in the iteration vectors into two components: an iteration error and a noise error—the latter due to the noise in the data—and we derive upper bounds for the noise error. The regularizing effects of this method have been discussed and investigated experimentally by several authors, e.g. [8, 34]. To our knowledge, however, a bound for the noise error has not been derived previously.

Our paper is organized as follows. Section 2 presents numerical results that demonstrate the regularizing properties of Kaczmarz's method and give some computational insight into its advantage over other algebraic iterative methods. Section 3 sets the stage for our analysis by presenting the semi-convergence theory for simultaneous iterative reconstruction methods. Our main results are then presented in section 4 where we derive an upper bound for the noise propagation in the (block) Kaczmarz method, both in its 'standard' form and with projections. We also extend the analysis to a more general class of sequential block iterations. We conclude with numerical examples in section 5. Throughout the paper, we use the following notation:

  • || · || denotes the vector and matrix 2-norm.
  • By diag(zi), where zi can be either scalars or matrices, we mean the (block) diagonal matrix
  • With M a symmetric positive definite (SPD) matrix, $\Vert x\Vert _M = \sqrt{x^TMx}$ denotes a weighted Euclidean norm, and M1/2 the square root of M.
  • With A a rectangular matrix, σi(A) denotes its ith singular value; they are ordered such that ||A|| = σ1 ⩾ σ2 ⩾ ... ⩾ σr with r = rank(A). Moreover, $\mathcal {R}(A)$ and $\mathcal {N}(A)$ denote the range and null space, respectively, of A.
  • With Q a square matrix, ρ(Q) is its spectral radius and λi(Q) denotes its ith eigenvalue; we recall that:
    Equation (2)
    assuming the same ordering of the singular values and the eigenvalues.
  • $\mathcal {C}$ denotes a closed convex set and $P_{\mathcal {C}}\!$ is the metric projection onto this set.

2. Motivation

Kaczmarz's algorithm comes in many variants—see, e.g. [22, chapter 11] and [26, 31]. It is often observed that it converges faster than other algebraic iterative methods such as Landweber and Cimmino. To illustrate this we consider two specific algorithms. The first method is the projected Cimmino algorithm with the iterations :

where $M = M_{\mathrm{C}} = {\textstyle \frac{1}{m}}\mathrm{diag}(1/\Vert a_i \Vert ^2)$, 0 < ω < 2||ATMCA||−1, and $a_i^T$ denotes the ith row of A. The second method is the projected symmetric Kaczmarz algorithm with cyclic control where each cycle, starting with yk, 1 = xk and j = 0, involves the updating iterations for i = 1, 2, ..., m, m − 1, m − 2, ..., 3, 2, 1

where 0 < ω < 2. After this cycle we set $x_{k+1} = P_{\mathcal {C}}(y_{k,2m})$. Note that one cycle of this algorithm involves twice as many flops as one iteration of the Cimmino algorithm. It is well known that the symmetric Kaczmarz algorithm can be written in the same form as the Cimmino algorithm with the SPD matrix M = MS given by

based on the splitting AAT = L + D + LT with D diagonal and L strictly lower triangular. We choose these two methods for our example because they can both be analyzed by means of the singular value decomposition (SVD) [20]; but we emphasize that the superiority of Kaczmarz's algorithm and its variants—over Landweber, Cimmino and related methods—holds in general.

Figure 1 shows the relative errors $\Vert x_k - \bar{x} \Vert / \Vert \bar{x} \Vert$ versus k for both algorithms (with $\bar{x}$ denoting the exact solution), and with three different choices of $\mathcal {C}$ corresponding to no constraints ($\mathcal {C}=\mathbb {R}^n$), non-negativity constraints ($\mathcal {C} = \mathbb {R}_+^n$), and box constraints ($\mathcal {C} = [0,1]^n$). The matrix is 15 232 × 40 000 and comes from discretization of a parallel-beam tomography problem (more details about this test problem are given in section 5). The right-hand side in (1) was perturbed with white Gaussian noise δb such that $\Vert \delta b \Vert / \Vert \bar{b} \Vert = 0.01$. The starting vector was zero, and for both algorithms we used the optimal relaxation parameters (ω = 473 for Cimmino and ω = 1.05 for symmetric Kaczmarz) as found by AIR Tools [21]. The best reconstructions (marked by the circles in the error histories) are shown in figure 2. Both algorithms achieve approximately the same accuracy, but symmetric Kaczmarz converges much faster than Cimmino.

Figure 1.

Figure 1. Error histories for Cimmino's algorithm and the symmetric Kaczmarz algorithm without constraints, with non-negativity constraints, and with box constraints; note the logarithmic k-axis. For both algorithms we used the optimal relaxation parameters.

Standard image High-resolution image
Figure 2.

Figure 2. Exact image $\bar{x}$ and best solutions xk obtained by the methods from figure 1. All solutions are shown as 200 × 200 images.

Standard image High-resolution image

To obtain more insight into the performance of the two algorithms, we computed the singular values of A as well as of the two matrices $M_{\mathrm{C}}^{1/2}A$ and $M_{\mathrm{S}}^{1/2}A$ associated with Cimmino and symmetric Kaczmarz. For computational reasons we used a smaller matrix of size 7556 × 10 000. The results are shown in figure 3. For both A and Cimmino, we observe the 'standard' behavior where the singular values decay gradually to zero. For symmetric Kaczmarz, on the other hand, there is a large cluster of singular values close to 1, and the first 250 singular values are identical to 1 within rounding errors. The large cluster of singular values close to 1 means that we capture a lot of SVD components in a few cycles. This intuitively explains the fast convergence of symmetric Kaczmarz compared to Cimmino.

Figure 3.

Figure 3. The singular values of A and those of the matrices $M_{\mathrm{C}}^{1/2}A$ and $M_{\mathrm{S}}^{1/2}A$ associated with Cimmino and symmetric Kaczmarz.

Standard image High-resolution image

An inspection of the right singular vectors—the basis vectors of the solutions—sheds more light on the appearance of the solutions, see figure 4. For the matrix A as well as $M_{\mathrm{C}}^{1/2}A$ for Cimmino we see the traditional behavior, namely, that the vectors provide a spectral basis where low-frequency information is associated with the largest singular values. For symmetric Kaczmarz, we see a very different behavior: all singular vectors of $M_{\mathrm{S}}^{1/2}A$ have low, medium, and high frequencies present in them. Here, the matrix $M_{\mathrm{S}}^{1/2}$ acts as a 'preconditioner' that changes the right singular vectors so that they do not form a spectral basis. The observation that high frequencies are present in all singular vectors explains intuitively why we can compute solutions with sharp edges (and other fine structures) even after a few cycles, because high frequencies are always present in the singular vectors.

Figure 4.

Figure 4. Some right singular vectors (shown as 100 × 100 images) of A (top row), $M_{\mathrm{C}}^{1/2}A$ for Cimmino (middle row), and $M_{\mathrm{S}}^{1/2}A$ for symmetric Kaczmarz (bottom row).

Standard image High-resolution image

3. Setting the stage: analysis of SIRT-type algorithms

Before studying the Kaczmarz algorithm and its projected version, in this section we consider algorithms related to the simultaneous iterative reconstruction technique (SIRT) methods. In particular we study their semi-convergence, and we extend some results from an earlier paper [16] on this subject. It should be noted that a semi-convergence analysis, cf [18, 20], consists of two parts: noise-error analysis and iteration-error analysis (i.e. convergence for exact data). In the following we will use 'convergence' to mean convergence for exact data (as is the usual convention).

3.1. Convergence properties

We consider the following two algorithms—an unconstrained algorithm and its constrained variant.

Choose arbitrary initial vectors $x_0,\hat{x}_0 \in \mathbb {R}^n$ and update the iteration vectors xk and $\hat{x}_k$ by means of

Equation (3)

Equation (4)

Here ω > 0 is the relaxation parameter, and we assume that M is nonsingular—notice that we do not require M to be SPD.

When M is also SPD, several well-known methods can be written in the form (3) for appropriate choices of the matrix M. With M equal to the identity we get the classical (projected) Landweber method, e.g. [18, 20]. Cimmino's method [10] is obtained with $M = M_{\mathrm{C}} = \frac{1}{m}\mathrm{diag}(1/\Vert a_i \Vert ^2)$ where ai denotes the ith row of A. The component averaging method [9] uses $M = \mathrm{diag}(1/\sum _{j=1}^n N_j a_{ij}^2)$ where Nj is the number of non-zeroes in the jth column of A. For applications of the SIRT methods in imaging and related areas see [2, 3, 6, 35, 36].

For the case where M is SPD we define the following constrained weighted least-squares problem associated with the SIRT methods:

Equation (5)

For the unconstrained problem (${\mathcal {C}}=\mathbb {R}^n$) the minimization problem in (5) always has a solution, whereas the constrained problem has a solution for any right-hand side b if and only if $A({\mathcal {C}})$, the image set of ${\mathcal {C}}$, is closed, cf [7, p 444], [27, p 442]. This is the case, e.g. when ${\mathcal {C}}$ is bounded or when the objective function is coercive, i.e. the null space of A is nonempty. Note that for this case the objective function is strictly convex and a unique solution exists. We have the following convergence result (see, e.g. [4]).

Theorem 1. Assume that the solution set of (5) is nonempty. Then the iterates of (3) (corresponding to $\mathcal {C}=\mathbb {R}^n$ in (5)) and (4) converge toward a solution of (5) if

Equation (6)

Convergence results for the case when M is not SPD (but nonsingular) can be found in [11, 14, 15, 25, 29] for the unconstrained case, and in [22, 23, 28] for the constrained case.

3.2. Semi-convergence properties

Although the above-mentioned convergence results are interesting and relevant, it is important to remember that for ill-conditioned problems—even for the finite-dimensional problems considered here—the solution set is usually completely deprived of any physical meaning due to the inverted noise. However, numerical results, e.g. [2, 8] show that the SIRT-type methods exhibit semi-convergence so that during the first steps the iterates improve, while later on, due to noise propagation, they start to deteriorate.

The semi-convergence properties when M is SPD were recently investigated in [17] for U-ALG and in [16] for P-ALG, see also [13]. Here we extend this analysis to the case when M is not SPD. Examples of such algorithms include cycles of the Kaczmarz method, and more generally, cycles of block iteration, cf [11, 15].

Let xk and $\bar{x}_k$ denote the iterates of U-ALG using the noisy and the noise-free right-hand side b and $\bar{b}$, respectively, and let $\bar{x}$ be a limit point of (3). Then we can write:

Equation (7)

i.e. the error decomposes into two components: the noise error $ x_k - \bar{x}_k$ and the iteration error $\bar{x}_k - \bar{x}$ (in [18, p 157], where Landweber iteration is considered, they are called 'approximation error' and 'data error'). It is the interplay between these two error-terms that explains the semi-convergence of the method. Classical convergence theory (e.g. theorem 1 above) focuses on the iteration error, and then it is normally assumed that the noise error is negligible. As mentioned above, this is not the case for ill-posed and noisy problems.

To study the noise error we first introduce some notation:

Equation (8)

Further let

Equation (9)

where $P_{{\mathcal {R}}(A^T)}$ denotes the orthogonal projection onto the range of AT. We assume that

Equation (10)

and we return to this assumption in section 4.1. By (3) it follows that the noise error $e_k^{\rm N}$, defined in (8), satisfies

and so it follows, assuming $e_0^{\rm N} = x_0 - \bar{x}_0 = 0$, that

Equation (11)

Let u = ATMδb; now

and it follows by induction that

Equation (12)

If we define the quantities $\widetilde{E}_k$ and Ek by

Equation (13)

then they are simple upper bounds for the noise error:

Equation (14)

These inequalities show how the results from [16] carry over to the case when M is not SPD.

3.3. Bounds when M is SPD

Assuming that M is SPD and ω ∈ (0, ||M1/2A||−2), the following bound was derived for the unconstrained case in [17, equation (3.4)]:

Equation (15)

Here ζk are roots of a certain polynomial (they satisfy 0 < ζk < ζk + 1 and limkζk = 1). Since

we find that

Equation (16)

In [11, lemma 6.2], assuming M = I and ω = 1, the bound $\Vert \delta b \Vert \, \sqrt{k}$ is also derived. The constrained case when M is SPD was considered by Eicke [12] and in [16, equation (3)] where it is shown (assuming $\mathcal {N}(A)=\lbrace 0 \rbrace$) that

Equation (17)

where σn is the smallest singular value of M1/2A.

4. Noise-error analysis of Kaczmarz's algorithm and its extensions

We now turn to the core of this paper, namely, the noise-error analysis of Kaczmarz's algorithm. For the sake of generality, we formulate our analysis for the block version (which includes the classical algorithm when each block is a single row). Let A and b be partitioned into p disjoint row-blocks {Ai} and {bi}, respectively. We introduce the matrices

where '+' denotes the 1,3–inverse; see [14] and [32] for definition and implementation details. Then the block Kaczmarz algorithm takes the form:

Block Kaczmarz Algorithm

Initialization: $x_0\in \mathbb {R}^n$ is arbitrary.

Iterative steps: For k = 1, 2, ... and given xk perform the cycle

Equation (18)

When p = m (each block consists of a single row) then Mi is the scalar Mi = 1/||ai||2. We define one cycle as one pass through all rows.

Kaczmarz's method has a long history and rich literature. Originally it was proposed (for p = m) by Kaczmarz [24] and, independently, for use in image reconstruction by Gordon, Bender and Herman [19], see also [22]. An interesting recent development is given in [2631]. The block-case appears in [11, 14, 25]. Kaczmarz's method is sometimes also referred to as ART; but it should be noted that the name ART frequently denotes algebraic reconstruction methods in general.

4.1. Noise-error analysis for the unconstrained case

In order to perform a noise-error analysis of ART, let

where L is strictly block-lower triangular and $D=\mathrm{diag}(A_iA_i^T)$. Note that Ma is nonsingular but for ω ≠ 0 not symmetric. One cycle, cf [5, 25], can then be written

Equation (19)

We introduce new notation following (8):

Equation (20)

and

Equation (21)

Assumption (10) now is fulfilled due to (25), see, e.g. [14, theorem 10] and [33].

Lemma 1. Let $\bar{\omega }=\omega (2-\omega )$ and Ra = D1/2MaA; then

Proof. We will show that

Equation (22)

whence the result follows using (2) with $X=\widetilde{Q}_{\mathrm{a}}$, i.e.

(note that PT = P). Consider the expansion :

Equation (23)

Now:

Equation (24)

Then using (24) and (23), we get :

where we defined:

Using $M_{\mathrm{a}}^{-1} = D+\omega L$ and $M_{\mathrm{a}}^{-T} = D+\omega L^T$ we get

so that $Q_{\mathrm{a}}^T Q_{\mathrm{a}}= I-A^T M_{\mathrm{a}}S M_{\mathrm{a}}A = I-\bar{\omega } R_{\mathrm{a}}^T R_{\mathrm{a}}$ proving (22) and the lemma. □

Lemma 2. Assume that

Equation (25)

Then

Equation (26)

Proof. Let $V=I-\bar{\omega } \, R_{\mathrm{a}}^T R_{\mathrm{a}}$. By (2) and lemma 1

The eigenvectors of V are orthogonal so they must either belong to $\mathcal {N}(A)$ or $\mathcal {R}(A^T)$. Let γ be any eigenvalue of V such that its eigenvector u belongs to $\mathcal {R}(A^T)$. Then from Vu = γu follows PVPu = γu. On the other hand, if $u \in \mathcal {N}(A)$ is any eigenvector of V in the null space of A then Vu = u and PVPu = 0. We conclude that V and PVP have the same eigenvalues except those corresponding to eigenvectors in $\mathcal {N}(A)$. Since $\bar{\omega } > 0$ (due to assumption (25)) it follows, also using (2), that $\lambda _{\max }(P\, VP) = 1-\bar{\omega }\sigma _r^2(R_{\mathrm{a}})$. □

We can now formulate the central theorem of our paper on the propagation of the noise error in the block Kaczmarz algorithm.

Theorem 2. Let

Equation (27)

Assuming (25) the noise error in the block Kaczmarz algorithm is bounded above by:

Equation (28)

where σr is the smallest nonzero singular value of Ra = D1/2MaA.

Proof. We first note that :

Using (13) and Taylor expansion we get

Further, again using Taylor expansion,

Combining these two error bounds the result follows. □

Using the inequality $(1 - (1-x^2)^k)/x \le \sqrt{k}$ for x ∈ (0, 1) with $x=\sqrt{\omega }\sigma _r$ the following bound for Ea, k emerges (where $\sigma _r < 1/\sqrt{\omega }$):

Equation (29)

Note that this bound includes a factor $\sqrt{k}$ similar to the bound for SIRT in (16). The factor $\sqrt{\omega } \delta _{\mathrm{a}}/\sigma _r$ is unrealistically large, but we are not able to give a sharper bound. In the appendix we provide an alternative analysis in terms of eigenvalues; there we show that such an analysis is not guaranteed to sharpen the bound.

4.2. Extension to general block iteration

In [11] a family of so-called relaxation matrices $\lbrace M_{i,k} \rbrace _{i=1}^p$ are introduced and used in (18), thereby extending the block Kaczmarz algorithm. Convergence conditions are given for bounded relaxation matrices. Following [15] we will, however, assume that the family $\lbrace M_i \rbrace _{i=1}^p$ are all SPD and also not depending on k.

The examples given in the previous section all have their block-iterative counterparts in which, unlike block Kaczmarz, the matrices Mi are diagonal. With p = 1 there is just one block, and the method becomes a fully simultaneous iteration. The following result from [15] shows how the cycles of block iteration can be written as in algorithm U-ALG:

Equation (30)

where

Equation (31)

Convergence conditions are given in [11], [15, proposition 6], [25]:

Theorem 3. The sequence {xk} in (30) converges to a solution of ATMb(bAx) = 0 provided that 0 < epsilon ⩽ ω ⩽ W where

Equation (32)

Since $\rho (A_i M_i A_i^T) = \rho (M_i A_i A_i^T)=1$ with $M_i = (A_i A_i^T)^+$, the usual bound ω ∈ (0, 2) is retrieved for ART.

Let Qb = I − ωATMbA. Elementary calculations, similar to those leading to (22) yield

Hence lemma 1, with $\widetilde{Q}_{\mathrm{b}}= Q_{\mathrm{b}}P$, takes the form

In order to use the previous analysis we must investigate when $\bar{D}$ is SPD so that we can write

Lemma 3. The matrix $\bar{D}=2\widehat{D}-\omega D$ is SPD provided that

Equation (33)

Proof. The matrix $\bar{D}=2\widehat{D}-\omega D=\mathrm{diag}(2M_i^{-1}-\omega A_i A_i^T)$ is SPD if all matrices $2M_i^{-1}-\omega A_i A_i^T$, i = 1, 2, ..., p are SPD. Now

which is SPD if $2 - \omega \sigma _1^2 (M_i^{1/2} A_i) > 0$ or $\omega < 2/\rho (A_i^T M_i A_i)$. So if $\omega < W = 2/\max _i \rho (A_i^T M_i A_i)$ then $\bar{D}$ is SPD. □

Similar to lemma 2, we get $\Vert \widetilde{Q}_{\mathrm{b}}\Vert = 1-\bar{\omega }\sigma _r(\bar{R})^2$. Therefore theorem 2 also holds for block iteration but with R replaced by $\bar{R}$ and the upper bound on ω replaced by W.

4.3. Noise-error analysis for the projected case

In order to treat the projected block Kaczmarz algorithm also, we return to P-ALG. Let $\hat{x}_k$ and $\hat{\bar{x}}_k$ denote the iterates of P-ALG using the noisy and the noise-free right-hand side, respectively. If we define the noise error $\hat{e}_k^{\rm N}=\hat{x}_k - \hat{\bar{x}}_k$ we get (with the same notation as in section 3.2), using the non-expansivity property of the metric projection, and assuming that $\hat{e}_0^{\rm N}=0$,

Equation (34)

Now if we assume that $\mathcal {N}(A)=\lbrace 0 \rbrace$ then $Q=\widetilde{Q}$, and we get

Equation (35)

which is the same expression (13) as in the unconstrained case. Hence we find that theorem 2 with q = qa in this case also holds for the projected block Kaczmarz method.

We next consider the case $\mathcal {N}(A) \ne \lbrace 0 \rbrace$. If the iterates $\lbrace \hat{x}_k\rbrace$ all remain in ${\mathcal {R}}(A^T)$ then again $Q=\widetilde{Q}$ so that theorem 2 holds. We consider this case in more detail as follows. The iteration (4) can be written:

Equation (36)

Note that the projections are performed after each cycle (and not after each iteration).

Lemma 4. Let $ \bar{\hat{x}}_0=\hat{x}_0 \in \mathcal {R}(A^T)$, and assume we have the property:

Equation (37)

Then $\hat{e}_k^{\rm N} \in \mathcal {R}(A^T)$, ∀k ⩾ 0.

Proof. From the assumption it follows that $\hat{y}_0 \in \mathcal {R}(A^T)$ and hence $\hat{x}_1 \in \mathcal {R}(A^T)$. By induction it follows that $\hat{x}_k \in \mathcal {R}(A^T)$, ∀k ⩾ 0 and, similarly, $\hat{\bar{x}}_k \in \mathcal {R}(A^T)$, ∀k ⩾ 0. □

Condition (37) may be hard to verify. An alternative approach for the case $\mathcal {N}(A) \ne \lbrace 0 \rbrace$ is to consider a regularized problem having full rank; this was done in [16] where the constrained case was treated for M being SPD.

For block iteration using a family of SPD matrices $\lbrace M_i \rbrace _{i=1}^p$ one can of course also define corresponding constrained versions and repeat the arguments above.

5. A numerical example

The purpose of this section is to investigate how the bound in theorem 2 for the noise error in Kaczmarz's method compares with the real noise error. In particular, we will use the upper bound in equation (29), and we will consider the fully sequential version of the algorithm (with p = m) with no constraints as well as with box constraints. We also include the symmetric Kaczmarz method in our test, cf section 2. Both methods use cyclic control.

We use an example from tomographic image reconstruction, implemented in the function paralleltomo in the MATLAB package AIR Tools [21], which generates a sparse matrix A, an exact solution $\bar{x}$ (which represents the Shepp–Logan phantom) and an exact right-hand side $\bar{b} = A\, \bar{x}$ that represents the noise-free data. This test problem models a standard 2D parallel-beam tomography problem. This is the same problem that was used in section 2.

In our case the matrix dimensions are m = 15 232 and n = 40 000 corresponding to a phantom of size 200 × 200, and using 60 projection angles θ = 3, 6, 9, ..., 180°. We added Gaussian white noise δb to the exact data scaled such that $\eta = \Vert \delta b \Vert / \Vert \bar{b} \Vert$ equals 0.01 and 0.05, and we used ω = 1.05 (which is close to optimal as computed by means of the function trainLambdaART in AIR Tools).

The results are shown in figure 5 where we see, as expected, that the noise error increases with the number of iterations k. The upper bound in (29) is proportional to $\sqrt{k}$, which is also plotted for reference, and we see that the actual errors indeed, after 20–30 iterations, exhibit a $\sqrt{k}$-dependence. Hence, this experiment confirms the overall behavior of the bound. Figure 5 also shows the corresponding iteration errors; they decrease as expected, and the errors for the projected methods are smaller.

Figure 5.

Figure 5. Left: the noise error versus k for Kaczmarz and symmetric Kaczmarz, with no constraints and with box constraints, together with $\sqrt{k}$ which tracks the behavior of these errors. Right: the corresponding iteration error versus k. We used two different noise levels η = 0.01 (top) and η = 0.05 (bottom).

Standard image High-resolution image

The bound (29) includes a factor $(\sqrt{\omega }\delta _{\mathrm{a}})/\sigma _r$. For this problem we have ω = 1.05 and δa = ||ATMaδb|| = 1.2, but we cannot compute σr, the smallest nonzero singular value of Ra = D1/2MaA, because of the large size of this matrix. Via computations with problems of smaller dimensions, we obtain the extrapolated estimate σr = O(10−7) and hence we can estimate the factor $(\sqrt{\omega }\delta _{\mathrm{a}})/\sigma _r = O(10^7)$ which clearly gives an extremely large over-estimate of the noise error. This is due to the repeated use of norm inequalities in the derivation of the bound.

Our main observation is that while the specific upper bound in (29) gives too large an estimate for the noise error, it still tracks the correct behavior of the error via the factor $\sqrt{k}$. It remains an open problem to sharpen the bound in theorem 2.

6. Conclusion

By numerical examples we motivate the use of Kaczmarz's method—a sequential iteration method—over the simultaneous iteration methods. We then present a noise-error analysis of the Kaczmarz method. By splitting the error into two components, the noise error and the iteration error, we provide theoretical insight into the semi-convergence of this method that can justify the well-known behavior of Kaczmarz's method. We also give results for the projected Kaczmarz method and show that it exhibits the same qualitative behavior. Finally, we provide numerical examples which illustrate that while our bounds include a very unrealistic and pessimistic factor, they perfectly track the behavior of the noise error for the methods considered.

Acknowledgments

This work is supported by Advanced grant no. 291405 'High-Definition Tomography' from the European Research Council. We thank the referees for constructive comments that helped to improve the presentation.

Appendix.: Noise error analysis based on eigenvalues

In the paper our analysis of the noise error is based on the SVD. This appendix presents an alternative approach using the eigenvalues of the matrix $\widetilde{Q}=(I-\omega A^TMA)P_{R(A^T)}$ defined in (8)–(9). Let $\lbrace v^i,\lambda _i\rbrace _{i=1}^n$ be the eigenpairs of $\widetilde{Q}$ with |λ1| ⩾ |λ2| ⩾ ⋅⋅⋅ ⩾ |λn|. Here we assume that $\widetilde{Q}$ has a basis of eigenvectors, i.e. it is nondefective.

A matrix Q is normal if QTQ = QQT, and such a matrix is always nondefective and its singular values are σi(Q) = |λi(Q)|. By direct calculation, one easily verifies that $\widetilde{Q}$ is not normal. However, since not all non-normal matrices are defective (but all defective matrices are non-normal), $\tilde{Q}$ may still be nondefective. Note that the concept of defectiveness implies not only the lack of an eigenvector basis but also that the matrix powers no longer only depend on eigenvalues. So while $Q^n v_i=\lambda _i^nv_i$ for a nondefective Q, the right-hand-side will be a polynomial in n times eigenvalue powers when Q is defective.

With our assumption we can write:

Equation (A.1)

and using (12) we get

By taking norms we get

By the convergence results for ART we have |λi| < 1 when ω ∈ (0, 2). Hence $\sum _{j=0}^{k-1}| \lambda _1^j| = (1-| \lambda _1|^k)/(1-|\lambda _1|)$. By putting |λ1| = 1 − x2 we find (using that x ∈ (0, 1)) $\sum _{j=0}^{k-1}| \lambda _1^j|=(1-(1-x^2)^k)/x^2 \le \sqrt{k}/x$. It follows that

Equation (A.2)

We next consider two special cases.

  • γ1 ≠ 0 and γi = 0 for i = 2, 3, ..., n. Then ATMδb = γ1v1. Hence by (A.2)
    Equation (A.3)
  • γi = 0, i = 1, 2, ..., n − 1 and γn ≠ 0. Then one easily finds:
    Equation (A.4)
    We see that ω||ATMδb|| is a lower bound for the noise error.

We conclude that the eigenvalue analysis does not provide a preferable alternative to the SVD analysis. On the theoretical side, it is not clear that $\widetilde{Q}$ has a basis of eigenvectors, and it is difficult to compare our bound (29) with the above upper bounds since there is no simple relation between the singular values of D1/2MaA and the eigenvalues of $\widetilde{Q}$. On the practical side, it is computationally demanding to estimate both σr and λ1.

Please wait… references are loading.
10.1088/0266-5611/30/5/055007