Enhancing joint reconstruction and segmentation with non-convex Bregman iteration

Veronica Corona; Martin Benning; Matthias J Ehrhardt; Lynn F Gladden; Richard Mair; Andi Reci; Andrew J Sederman; Stefanie Reichelt; Carola-Bibiane Schönlieb

doi:10.1088/1361-6420/ab0b77

1. Introduction

Image reconstruction plays a central role in many imaging modalities for medical and non-medical applications. The majority of imaging techniques deal with incomplete data and noise, making the inverse problem of reconstruction severely ill-posed. Based on compressed sensing (CS) it is possible to tackle this problem by exploiting prior knowledge of the signal [1–3]. Nevertheless, reconstructions from very noisy and undersampled data will present some errors that will be propagated into further analysis, e.g. image segmentation. Segmentation is an image processing task used to partition the image into meaningful regions. Its goal is to identify objects of interest, based on contours or similarities in the interior. Typically segmentation is performed after reconstruction, hence its result strongly depends on the quality of the reconstruction. Recently the idea of combining reconstruction and segmentation has become more popular. The main motivation is to avoid error propagations that occur in the sequential approach by estimating edges simultaneously from the data, ultimately improving the reconstruction. In this paper, we propose a new model for joint reconstruction and segmentation from undersampled magnetic resonance imaging (MRI) data. The underlying idea is to incorporate prior knowledge about the objects that we want to segment in the reconstruction step, thus introducing additional regularity in our solution. In this unified framework, we expect that the segmentation will also benefit from sharper reconstructions. We demonstrate that our joint approach improves the reconstruction quality and yields better segmentations compared to sequential approaches. In figure 1, we consider a brain phantom from which we simulated the undersampled k-space data and added Gaussian noise. Figures 1(b) and (e) present reconstructions and segmentations obtained with the sequential approaches, while figures 1(c) and (f) show the results for our joint approach. The reconstruction using our method shows clearly more details and it is able to detect finer structures that are not recovered with the classical separate approach. As a consequence, the joint segmentation is also improved. In the following section we present the mathematical models that we used in our comparison. We investigated the performance of our model for two different applications: bubbly flow and cancer imaging. We show that both reconstruction and segmentation benefit from this method, compared to the traditional sequential approaches, suggesting that error propagation is reduced.

**Figure 1.** Sequential approach (left) versus unified approach (right). Combining reconstruction and segmentation in a single unified approach improves both the reconstructed image and its segmentation. See figure 2 for more details. (a) Groundtruth. (b) Sequential reconstruction. (c) Joint reconstruction. (d) Sampling matrix. (e) Sequential segmentation. (f) Joint segmentation.
Download figure:
Standard image High-resolution image

Our contribution. In our proposed joint method, we obtain an image reconstruction that preserves its intrinsic structures and edges, possibly enhancing them, thanks to the joint segmentation, and simultaneously we achieve an accurate segmentation. We consider the edge-preserving total variation regularisation for both the reconstruction and segmentation term using Bregman distances. In this unified Bregman iteration framework, we have the advantage of improving the reconstruction by reducing the contrast bias in the TV formulation, which leads to more accurate segmentation. In addition, the segmentation constitutes another prior for the reconstruction by enhancing edges of the regions of interest. Furthermore, we propose a non-convex alternating direction algorithm in a Bregman iteration scheme for which we prove global convergence.

The paper is organised as follows. In section 2 we describe the problems of MRI reconstruction and region-based segmentation. We then introduce our joint reconstruction and segmentation approach in a Bregman iteration framework. This section also contains a detailed comparison of other joint models in the literature. In section 3 we study the non-convex optimisation problem and present the convergence analysis for this class of problems. Finally in section 4 we present numerical results for MRI data for different applications. Here we investigate the robustness of our model by testing the undersampling rate up to its limit and by considering different noise levels.

2. MRI reconstruction and segmentation

In the following section we introduce the mathematical tools to perform image reconstruction and image segmentation. In this work, we focus on the specific MRI application; however, our proposed joint method can be applied to other imaging problems in which the measured data is connected to the image via a linear and bounded forward operator, see section 2.1. Finally we present our model that combines the two tasks of reconstruction and segmentation in a unified framework.

2.1. Reconstruction

In image reconstruction problems, we have the general setting

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle f=Au+\eta, \label{eq:generalforward} \nonumber \end{align} \tag{ 1 }$

where $A:\mathbb{X}\to \mathbb{Y}$ is a bounded and linear operator mapping between two vector-spaces. The measured data $f\in \mathbb{Y}$ is usually corrupted by some noise $\newcommand{\e}{{\rm e}} \eta$ and often only observed partially. In this formulation we are interested in recovering the image u given the data f .

In this work, we focus on the application of MRI and we refer to the measurements f as the k-space data. In standard MRI acquisitions, the Fourier coefficients are collected in the k-space by radio-frequency (RF) coils. Because the k-space data is acquired sequentially, the scanning time is constrained by physical limitations of the imaging system. One of the most common ways to perform fast imaging consists of undersampling the k-space; this, however, only yields satisfactory results if the dimension of the parameter space can implicitly be reduced, for example by exploiting sparsity in certain domains. In the reconstruction, this assumption is incorporated in the regularisation term. Let $\Omega:=\{1, \dots , n_1 \} \times \{1, \dots , n_2 \}$ with $n_1, n_2 \in \mathbb{N}$ be a discrete image domain. Let $f= (\,f_i)_{i=1}^m \in \mathbb{C}^m$ with $m\ll n=n_1 n_2$ be our given undersampled k-space data, where $f_i \in \mathbb{C}$ are the measured Fourier coefficients that fulfil the relationship (1) with A = SF. The operator A is now composed by $\mathcal{S}:\mathbb{C}^n \to \mathbb{C}^m$ , which is a sampling operator that selects m measurements from the Fu data according to the locations provided by a binary sampling matrix (see e.g. figure 1(d)), where F is the discrete Fourier transform. In MRI, the noise $\newcommand{\e}{{\rm e}} \eta$ is drawn from a complex-valued Gaussian distribution with zero mean and standard deviation $\sigma$ [4].

In problem (1) for MRI, the aim is to recover the image $u \in \mathbb{C}^n$ from the data. However, in this work we follow the standard assumption that in many applications we have negligible phase, i.e. we are working with real valued, non-negative images. Therefore, we are only interested in $u \in \mathbb{R}^n$ ; hence we consider the MRI forward operator as $A : \mathbb{R}^{n} \to \mathbb{C}^m$ and its adjoint $A^*:\mathbb{C}^m \to \mathbb{R}^{n}$ as modelled in [5]. Problem (1) is ill-posed due to noise and incomplete measurements. The easiest approach to approximate (1) is to compute the solution, for which the missing entries are replaced with zero:

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle u_z= A^* f. \nonumber \end{align*}$

However, images reconstructed with this approach will suffer from aliasing artifacts because undersampling the k-space violates the Nyquist–Shannon sampling theorem. Therefore, we consider a mathematical model that incorporates prior knowledge by using a variational regularisation approach. A popular model is to find an approximate solution for u as a minimiser of the Tikhonov-type regularisation approach:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle u^* \in \mathop{{\rm arg}~{\rm min}}\limits_{u} \Big\{\frac{1}{2} \| Au - f\|_{2}^{2} + \alpha J(u) \Big\}, \label{eq:tik} \nonumber \end{align} \tag{ 2 }$

where the first term is the data fidelity that forces the reconstruction to be close to the measurements and the second term is the regularisation, which imposes some regularity on the solution. The parameter $\alpha > 0$ is a regularisation parameter that balances the two terms in the variational scheme. In this setting, different regularisation functionals J can be chosen (see [6] for a survey of variational regularisation approaches).

Although problems of the form (2) are very effective, they also lead to a systematic loss of contrast [7–9]. This is typically observed for common choices of the regulariser J, i.e. convex functional. To overcome this problem, [10] proposed an iterative regularisation method based on the generalised Bregman distance [11, 12]. The Bregman distance with respect to J is defined as

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle D_{J}^{\,p^k}(u,u^k)= J(u) - J (u^k) - \langle\,p^k,u-u^k \rangle, \label{bregmandistance} \nonumber \end{align} \tag{ 3 }$

with $p^k \in \partial J(u^k)$ , where $\partial J (u^k)$ is called sub-differential and it is a generalisation of the classical differential for convex functions. We replace problem (2) with a sequence of minimisation problems:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle u^{k+1} \in \mathop{{\rm arg}~{\rm min}}\limits_{u} \Big\{\frac{1}{2} \| Au - f\|_{2}^{2} + \alpha D_{J}^{\,p^k}(u,u^k) \Big\}. \label{eq:rec_breg_J} \nonumber \end{align} \tag{ 4 }$

The update on the subgradient can be conveniently computed by the optimality condition of (4):

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle p^{k+1} = p^k - \frac{1}{\alpha} A^* (Au^{k+1} - f). \label{eq:pk} \nonumber \end{align} \tag{ 5 }$

In this work, we will focus on one particular choice for J, namely the total variation. The total variation (TV) regularisation is a well-known edge-preserving approach, first introduced by Rudin et al in [13] for image denoising. The TV regularisation, i.e. the 1-norm penalty on a discrete finite difference approximation of the 2D gradient $\nabla : \mathbb{R}^n \to (\mathbb{R}^2){}^n$ , that is $\nabla u(i,j) = (\nabla_1 u(i,j), \nabla_2 u(i,j)){}^T$ , is in the discrete setting

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle J(u)={\rm TV}(u)= \| \nabla u \|_{2,1} = \sum_{(i,j)\in \Omega} \sqrt{| \nabla_1 u(i,j) |^2 + | \nabla_2 u(i,j) |^2} , \label{eq:dtv} \nonumber \end{align} \tag{ 6 }$

for the isotropic case.

We then consider the Bregman iteration scheme in (4) for $J(u)={\rm TV}(u)$ . This approach is usually carried on by initialising the regularisation parameter $\alpha$ with a large value, producing overregularised initial solutions. At every step k, finer details are added. A suitable criterion to stop iterations (4) and (5) (see [6]), is the Morozov's discrepancy principle [14]. The discrepancy principle suggests to choose the smallest $k \in \mathbb{N}$ such that u^k+1 satisfies

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \| f - Au^{k+1}\|_2 \leqslant \sigma \sqrt{m}, \label{eq:discrepancy} \nonumber \end{align} \tag{ 7 }$

where m is the number of samples and $\sigma$ is the standard deviation of the noise in the data. Note that using Bregman iterations, the contrast is improved and in some cases even recovered exactly, compared to the variational regularisation model. In addition, it makes the regularisation parameter choice less challenging. Note that for different choices of J in (2), e.g. the Mumford–Shah/Potts model [15–19], we do not have loss of contrast, but we deal with a non-convex NP hard problem, algorithmically more challenging.

2.2. Segmentation

Image segmentation refers to the process of automatically dividing the image into meaningful regions. Mathematically, one is interested in finding a partition $\{\Omega_i\}_{i=1}^l$ of the image domain $\Omega$ subject to $\cup_{i=1}^l \Omega_i = \Omega$ and $\newcommand{\e}{{\rm e}} \cap_{i=1}^l \Omega_i = \emptyset$ . One way to do this is to use region-based segmentation models, which identify regions based on similarities of their pixels. The segmentation model we are considering was originally proposed by Chan and Vese in [20] and it is a particular case of the piecewise-constant Mumford–Shah model [15]. Given an image function $u:\Omega \to \mathbb{R}$ , the goal is to divide the image domain $\Omega$ in two separated regions $\Omega_1$ and $\Omega_2=\Omega \setminus \Omega_1$ by minimising the following energy function

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \int_{\Omega_1} (u(x) - c_1)^2 \,{\rm d}x + \int_{\Omega_2} (u(x) - c_2)^2 \,{\rm d}x + \beta \cdot {\rm Length}(C) \to \min_{c_1,c_2,C}, \nonumber \end{align*}$

where C is the desired contour separating $\Omega_1$ and $\Omega_2$ , and the constants c₁ and c₂ represents the average intensity value of u inside C and outside C, respectively. The parameter $\beta$ penalises the length of the contour C, controlling the scale of the objects in the segmentation. From this formulation we can make two observations: first, the regions $\Omega_1$ and $\Omega \setminus \Omega_1$ can be represented by the characteristic function

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle v(x)= \left\{\begin{array}{@{}lll@{}} 0, & {{\rm if~}} x \in \Omega_1 \cup C \nonumber \\ 1, & {{\rm if~}} x \in \Omega_2. \end{array}\right. \nonumber \end{align*}$

Second, the perimeter of the contour identified by the the characteristic function corresponds to its total variation, as shown by the Coarea formula [21]. This leads to the new formulation

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \int_{\Omega} v(x) (u(x) - c_1)^2 {\rm d}x + (1-v(x)) (u(x) - c_2)^2 \,{\rm d}x + \beta {\rm TV}(v) \to \min_{c_1,c_2, v \in \{0,1\}}. \nonumber \end{align*}$

Even assuming fixed constants c₁, c₂ the problem is non-convex due to the binary constraint. In [22] the authors proposed to relax the constraint, allowing $v(x)$ to assume values in the interval $[0,1]$ . They showed that for fixed constants c₁, c₂, global minimisers can be obtained by minimising the following energy:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \int_{\Omega} v(x) (u(x) - c_1)^2 {\rm d}x + (1-v(x)) (u(x) - c_2)^2\, {\rm d}x + \beta {\rm TV}(v) \to \min_{v \in [0,1]}, \label{eq:CEN} \nonumber \end{align} \tag{ 8 }$

followed by thresholding, setting $\Sigma = \{x : v(x) \geqslant \mu \} \; {{\rm for~a.e.~}}\mu\in [0, 1]$ . As the problem is convex but not strictly convex, the global minimiser may not be unique. In practice we obtain solutions which are almost binary, hence the choice of $\mu$ is not crucial.

Setting

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle s(x)=(u(x) - c_1)^2 - (u(x) - c_2)^2 \nonumber \end{align*}$

the energy (8) can be written in a more general form as

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \int_{\Omega} v(x) s(x)\, {\rm d}x + \beta {\rm TV}(v) \to \min_{v\in [0,1]}. \nonumber \end{align*}$

In this paper, we are interested in the extension of the two-class problem to the multi-class formulation [23]. Following the simplex-constrained vector function representation for multiple regions and its convex relaxation proposed in [24], we obtain as a special case a convex relaxation of the Chan–Vese model for arbitrary number of regions, which reads

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \int_{\Omega} \sum_{i=1}^{\ell} v_i(x) (c_i - u(x))^2\, {\rm d}x + \beta {\rm TV}(v) \to \min_{v\in \mathcal{C}}, \label{eq:seg_multi} \nonumber \end{align} \tag{ 9 }$

where $\newcommand{\e}{{\rm e}} \newcommand{\bi}{\boldsymbol} \mathcal{C}:= \{v:\Omega \to \mathbb{R}^\ell \,\big\vert \, v(x)\geqslant 0, \sum\nolimits_{i=1}^\ell v_i(x)=1\}$ is a convex set which restricts $v(x)$ to lie in the standard probability simplex. As in the binary case, the constants c_i describe the average intensity value inside region i. In this case we consider the vector-valued formulation of TV:

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle {\rm TV}(v)=\int_{\Omega} \sqrt{\| \nabla v_1 \|^2 + \dots +\| \nabla v_\ell \|^2}\, {\rm d}x. \nonumber \end{align*}$

2.3. Joint reconstruction and segmentation

MRI reconstructions from highly undersampled data are subject to errors, even when prior knowledge about the underlying object is incorporated in the mathematical model. It is often required to find a trade-off between filtering out the noise and retrieving the intrinsic structures while preserving the intensity configuration and small details. As a consequence, segmentations in the presence of artifacts are likely to fail.

In this paper, we propose to solve the two image processing tasks of reconstruction and segmentation in a unified framework. The underlying idea is to inform the reconstruction with prior knowledge of the regions of interest, and simultaneously update this belief according to the actual measurements. Mathematically, given the under-sampled and noisy k-space data f , we want to recover the image $u \colon \Omega \to \mathbb{R}$ and compute its segmentation $v$ in $\newcommand{\e}{{\rm e}} \ell$ disjoint regions, by solving the following problem:

$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\im}{{\rm Im}} \displaystyle \nonumber (u,v) &= \mathop{{\rm arg}~{\rm min}}\limits_{u,v} \underbrace{\frac{1}{2} \| Au - f\|_{2}^{2} + \alpha {\rm TV}(u)}_{\rm reconstruction}\nonumber \\ &+\underbrace{\delta \sum_{i=1}^n \sum_{j=1}^{\ell} v_{ij} (c_j - u_i)^2 + \beta {\rm TV}(v) + \imath_\mathcal{C}(v)}_{\rm segmentation}, \label{eq:jointTV} \nonumber \end{align} \tag{ 10 }$

where $\newcommand{\im}{{\rm Im}} \imath_\mathcal{C}(v)$ is the characteristic function over $\newcommand{\e}{{\rm e}} \newcommand{\bi}{\boldsymbol} \mathcal{C} := \{v: \mathbb{R}^n \to \mathbb{R}^\ell \,\big\vert \, v_{ij}\geqslant 0, \sum\nolimits_{j=1}^\ell v_{ij}=1, \forall$ $\newcommand{\e}{{\rm e}} \newcommand{\bi}{\boldsymbol} i \in \{1,\dots,n\}\}$ , and $\alpha$ , $\beta$ , $\delta >0$ are some regularisation parameters. However, instead of solving (10), we consider the iterative regularisation procedure using Bregman distances. The main motivation is to exploit the contrast enhancement aspect for the reconstruction thanks to the Bregman iterative scheme. By improving the reconstruction, the segmentation is in turn refined. Therefore, we replace (10) with the following sequence of minimisation problems for $k=0,1,2, \dots$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle u^{k+1} = \mathop{{\rm arg}~{\rm min}}\limits_{u} \frac{1}{2} \| Au - f\|_{2}^{2} + \alpha D_{{\rm TV}}^{\,p^k}(u,u^k) +\delta \sum_{i=1}^n \sum_{j=1}^{\ell} v_{ij} (c_j - u_i)^2 \label{eq:joint_u} \nonumber \end{align} \tag{ 11a }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle p^{k+1} = p^k - \frac{1}{\alpha} \left(A^*(Au^{k+1} - f) - 2 \delta \sum_{j=1}^\ell v_j^k (u^{k+1}-c_j) \right) \label{eq:joint_p} \nonumber \end{align} \tag{ 11b }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle v^{k+1} \in \ \mathop{{\rm arg}~{\rm min}}\limits_{v} \delta \sum_{i=1}^n \sum_{j=1}^{\ell} v_{ij} (c_j - u_i^{k+1})^2 + \chi_\mathcal{C}(v)+ \beta D_{{\rm TV}}^{q^k}(v,v^k) \label{eq:joint_v} \nonumber \end{align} \tag{ 11c }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle q^{k+1} =q^k - \frac{\delta}{\beta} (c_j-u^{k+1})^2.\label{eq:joint_q} \nonumber \end{align} \tag{ 11d }$

Note that (11) solves a problem different from (10). Assuming that a minimiser exists, the model (11) converges to a minimiser of

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \frac{1}{2} \| Au - f\|_{2}^{2} + \delta \sum_{i=1}^n \sum_{j=1}^{\ell} v_{ij} (c_j - u_i^{k+1})^2 \, , \nonumber \end{align*}$

as we will show in section 3.1. In case of noisy data f this is not desirable, so that we combine the iteration with a stopping criterion in order to form a regularisation method.

This model combines the reconstruction approach described in (4) and the discretised multi-class segmentation in (9) with a variation in the regularisation term, which is now embedded in the Bregman iteration scheme. In [25] the authors used Bregman distances for the Chan–Vese formulation (8), combined with spectral analysis, to produce multiscale segmentations.

As described in the previous subsection, the parameters $\alpha$ and $\beta$ describe the scale of the details in u and the scale of the segmented objects in $v$ . By integrating the two regularisations into the same Bregman iteration framework, we obtain that these scales are now determined by the iteration k + 1. At the first Bregman iteration k = 0, when $\alpha$ is very large, we obtain an over-smoothed u¹, and the value of $\beta$ is not very important. Intuitively, u¹ is almost piecewise constant with small total variation and a broad range of values of $\beta$ may lead to very similar segmentations $v^1$ . However, at every iteration k + 1, finer scales are added to the solution with the update p ^k+1. Accordingly, with the update q^k+1, which is independent of $v^{k+1}$ , the segmentation keeps up with the scale in the reconstructed image u^k+1.

The novelty of this approach is also represented by the role of the parameter $\delta >0$ . This parameter weighs the effect of the segmentation in the reconstruction, imposing regularity in u in terms of sharp edges in the regions of interest. In section 4 we show how different ranges of $\delta$ affects the reconstruction (see figure 12). Intuitively, large values of $\delta$ force the solution u to be close to the piecewise constant solution described by the constants c_i. This is beneficial in applications where MRI is a means to extract shapes and sizes of underlying objects (e.g. bubbly flow in section 4.1). On the other hand, with very small $\delta$ , the segmentation has little impact and the solutions for u are close to the ones obtained by solving the individual problem (4). Instead, intermediate values of $\delta$ impose sharper boundaries in the reconstruction while preserving the texture.

Obviously, we need to stop the iteration before the residual brings back noise from the data f . As we cannot use Morozov discrepancy principle in this case (due to the fact that $\| Au^k - f \|_2$ will rather increase due to the effect of the coupling term controlled by the parameter $\delta$ ), we stop when two consecutive iterates in $v$ are smaller than a certain tolerance, $\|v^{k+1}-v^k\| < tol$ , following the observation that the rate at which u^k+1 changes close to the optimal solution is low, in contrary to more abrupt changes at the beginning of the Bregman iteration and later on when it starts to add noise.

Clearly, problem (11) is non-convex in the joint argument $(u,v)$ due to the coupling term. However, it is convex in each individual variable. We propose to solve the joint problem by iteratively alternating the minimisation with respect to u and to $v$ (see section 3 for numerical optimisation and convergence analysis).

2.4. Comparison to other joint reconstruction and segmentation approaches

In this section we will provide an overview of some existing simultaneous reconstruction and segmentation (SRS) approaches with respect to different imaging applications.

2.4.1. CT/SPECT.

Ramlau and Ring [26] first proposed a simultaneous reconstruction and segmentation model for CT, that was later extended to SPECT in [27] and to limited data tomography [28]. In these work, the authors aim to simultaneously reconstruct and segment the data acquired from SPECT and CT. CT measures the mass density distribution $\mu$ , that represents the attenuation of x-rays through the material; SPECT measures the activity distribution f as the concentration of the radio tracer injected in the material. Given the two measurements $z^{\delta}$ and $y^{\delta}$ , from CT and SPECT, they consider the following energy functional

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle E(\,f,\mu,\Gamma^{\,f},\Gamma^{\mu}) = \|A(\,f,\mu) - y^{\delta} \|^2 + \beta \|R \mu -z^{\delta}\|^2 +\alpha ({\rm Length}(\Gamma^{\,f})+ {\rm Length}(\Gamma^{\mu})). \nonumber \end{align*}$

They propose a joint model based on a Mumford–Shah-like functional, in which the reconstructions of $\mu$ and f and the given data are embedded in the data term in a least squares sense. The operators A and R are the attenuated Radon transform (SPECT operator) and the Radon transform (CT operator), respectively. The penalty term is considered to be a multiple of the lengths of the contours of $\mu$ , $\Gamma^{\mu}$ and the contours of f , $\Gamma^{\,f}$ . These boundaries are modelled using level set functions. In these segmented partitions of the domain, $\mu$ and f are assumed to be piecewise constant. The optimisation problem is then solved alternatively with respect to the functional variables f and $\mu$ with fixed geometric variables $\Gamma^{\mu}$ and $\Gamma^{\,f}$ and the other way around.

In [29] the simultaneous reconstruction and segmentation is applied to dynamic SPECT imaging, which solves a variational framework consisting of a Kullback–Leibler (KL) data fidelity and different regulariser terms to enforce sharp edges and sparsity for the segmentation and smoothness for the reconstruction. The cost function is

$\begin{align*} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle E(u,c) = KL\big(R(u\cdot c),g\big) + \alpha \sum_{k=1}^K \|\nabla u_k\| + \frac{\delta}{2} \sum_{k=1}^K \| \frac{\partial}{\partial t} c_k \|^2_2. \nonumber \end{align*}$

Given the data g, they want to retreive the concentration curves c_k(t) in time for K disjoint regions and their indication functions u_k(x) in space. The optimisation is carried out alternating the minimisation over u having c fixed and then over c having u fixed.

In [30] they propose a variational approach for reconstruction and segmentation of CT images, with limited field of view and occluded geometry. The cost function

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle E(u,c,v)=\frac{1}{2} \|Ax-y\|^2 + \alpha \| \nabla u\| + \frac{\beta}{2} \left(\lambda \sum_i^n \sum_{k=1}^K v_{ik} (u_i-c_k)^2 + \frac{1}{2} \|Dv\|^2_2 \right)\nonumber \\ \nonumber \end{align*}$

s.t. a box constraint on the image values x and the simplex constraint on the labelling function $v$ . The operator A is the undersampled Radon transform modelling the occluded geometry and y is the given data. The second term is the edge-preserving regularisation term for u, the third term is the segmentation term which aims at finding regions in u that are close to the value c_k in region k. The operator D is the finite difference approximation of the gradient. The non-convex problem is solved by alternating minimisation between updates of $u,v,c$ .

2.4.2. PET and transmission tomography.

In [31], the authors propose a maximum likelihood reconstruction and doubly stochastic segmentation for emission and transmission tomography. In their model they use a hidden Markov measure field model (HMMFM) to estimate the different classes of objects from the given data r. They want to maximise the following cost function:

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle E(u,p,\theta)= \log P(r|u) + \log P(u|p,\theta) + \log P(\,p). \nonumber \end{align*}$

The first term is the data likelihood which will be modelled differently for emission and transmission tomography. The second term is the conditional probability or class fitting term, for which they use HMMFM. The third term is the regularisation on the HMMFM. The optimisation is carried out in three steps, where first they solve for u (image update) fixing $p, \theta$ , then for p , holding $u,\theta$ (measure field update) and finally for $\theta$ (parameter update) having $u,p$ fixed.

A variant of this method has been presented in [32], in which they incorporate prior information about the segmentation classes through a HMMFM. Here, the reconstruction is the minimisation over a constrained Bayesian formulation that involves a data fidelity term as a classical least squares fitting term, a class fitting term as a Gaussian mixture for each pixel given K classes and dependent of the class probabilities defined by the HMMFM, and a regulariser also dependent of the class probabilities. The model to minimise is

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle E(u,\delta) =& \lambda_{\rm noise} \|Au-b\|^2_2 - \sum_{j=1}^N \log \left(\sum_{k=1}^K \frac{\delta_{jk}}{\sqrt{2 \pi} \sigma_k} \exp \left(- \frac{(u_j -\mu_k)^2}{2 \sigma_k^2} \right) \right) + \lambda_{\rm class} \sum_{k=1}^K R(\delta_k) \nonumber \\ {{\rm s.t.}} \quad & \sum_{k=1}^K \delta_{jk}=1, \quad \delta_{jk} \geqslant 0,\quad j=1,\dots,N, \quad k=1,\dots, K. \nonumber \end{align*}$

The operator A will be modelled as the Radon transform in case of CT and b represents the measured data; N is the number of pixel in the image; $\lambda_{\rm noise}$ and $\lambda_{\rm class}$ are the regularisation parameters; $\mu_k, \sigma_k$ are the class parameters. The cost function is non convex and they solve the problem in an alternating scheme where they either update the pixel values or the class probabilities for each pixel.

Storath and others [33] model the joint reconstruction and segmentation using the Potts Model with application to PET imaging and CT. They consider the variational formulation of the Potts model for the reconstruction. Since the solution is piecewise constant, this directly induces a partition of the image domain, thus a segmentation. Given the data f and an operator A (e.g. Radon transform), the energy functional is in the following form:

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle E(u)= \lambda \| \nabla u \|_0 + \|Au-f\|^2_2, \nonumber \end{align*}$

where the first term is the jump penalty enforcing piecewise constant solutions and the second term is the data fidelity. As the Potts model is NP hard, they propose a discretisation scheme that allows to split the Potts problem into subproblems that can be solved efficiently and exactly.

2.4.3. MRI.

In [34], the authors proposed a joint model with application to MRI. Their reconstruction-segmentation model consists of a fitting term and a patch-based dictionary to sparsely represent the image, and a term that models the segmentation as a mixture of Gaussian distributions with mean, standard deviation and mixture weights $\mu$ , $\sigma$ , $\pi$ . Their model is

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle E(u,\Gamma,\mu,\sigma,\pi) = \|Au-y\|^2+\lambda \sum_{n=1}^N \|R_n u - D \gamma_n\|^2 - \beta {{\rm ln}} P(u|\mu,\sigma,\pi) \quad {{\rm s.t.}} \; \|\gamma_n\|_0 \leqslant T \; ~ \forall n, \nonumber \end{align*}$

where A is the undersampled Fourier transform, y is the given data, R_n is a patch extraction operator, $\lambda$ is a weighting parameter, T is the sparsity threshold, and $\gamma_n$ is the sparse representation of patch R_nu organised as column n of the matrix $\Gamma$ . The problem is highly non-convex and it is solved iteratively using conjugate gradient on u, orthogonal matching pursuit on $\Gamma$ and expectation–maximisation algorithm on $(\mu,\sigma,\pi)$ .

2.4.4. Summary.

Recently, the idea to solve the problems of reconstruction and segmentation simultaneously has become more popular. The majority of these joint methods have been proposed for CT, SPECT and PET data. Mainly they differ in the way they encode prior information in terms of regularisers and how they link the reconstruction and segmentation in the coupling term. Some imposes smoothness in the reconstruction [29], others sparsity in the gradient [26, 30, 33], other consider a patch-dictionary sparsifying approach [34]. In [33] they do not explicitly obtain a segmentation, but they force the reconstruction to be piecewise constant. Depending on the application, the coupling term is the data fitting term itself (e.g. SPECT), or the segmentation term. In [31, 32, 34] the authors model the segmentation as a mixture of Gaussian distribution, while [30] has a a region-based segmentation approach similar to what we propose. However, [30] penalises the squared 2-norm of segmentation, imposing spatial smoothness.

In our proposed joint approach, we perform reconstruction and segmentation in a unified Bregman iteration scheme, exploiting the advantage of improving the reconstruction, which results in a more accurate segmentation. Furthermore, the segmentation constitutes another prior imposing regularity in the reconstruction in terms of sharp edges in the regions of interest. We propose a novel numerical optimisation problem in a non-convex Bregman iteration framework for which we present a rigorous convergence result in the following section.

3. Optimisation

The cost function (11) is non-convex in the joint argument $(u,v)$ , but it is convex in each individual variable. To solve this problem we derive a splitting approach where we solve the two minimisation problems in an alternating fashion with respect to u and $v$ . We present the general algorithm and its convergence analysis in the next subsection. First, we describe the solution of each subproblem.

Problem in u. The problem in u reads

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle u^{k+1}= \mathop{{\rm arg}~{\rm min}}\limits_{u} \frac{1}{2} \| Au - f\|_{2}^{2} + \alpha ({\rm TV}(u) - \langle\,p^k,u \rangle) + \delta\sum_{i=1}^n \sum_{j=1}^{\ell} v_{ij}^k (c_j - u_i)^2. \nonumber \end{align*}$

We solve the optimisation for u, fixing $v$ , using the primal-dual algorithm proposed in [35–38]. We write $F(u)= \| u \|_1$ , $K(u)=\nabla u$ and $\newcommand{\e}{{\rm e}} G(u) = \frac{1}{2} \| Au - f\|_{2}^{2} - \alpha \langle\,p^k,u \rangle +$ $\newcommand{\e}{{\rm e}} \delta \sum\nolimits_{i=1}^n \sum\nolimits_{j=1}^{\ell} v_{ij}^k (c_j - u_i){}^2$ and obtain the following iterates for $\theta=1$ and step sizes $\sigma=\tau=0.99 / \| \nabla \|$

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle y^{n+1} &= \frac {y^n + \sigma \nabla \bar{u}^n}{\max (1, \| y^n + \sigma \nabla \bar{u}^n \|)} \nonumber \\ u^{n+1} &= \frac{u^n + \tau \nabla \cdot y^{n+1} + 2 \tau \delta \sum\nolimits_{j=1}^\ell v_j ^k c_j + \tau \alpha p^k +\tau A^* f}{1+ 2 \tau \delta + \tau A^*A} \nonumber \\ \bar{u}^{n+1} &= 2u^{n+1}-u^n. \nonumber \end{align*}$

After sufficiently many iterations we set $u^{k+1}=u^{n+1}$ and compute the update p ^k+1 from the optimality condition of (3) as (11b).

Problem in $v$ . The problem in $v$ reads

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle v^{k+1} = \mathop{{\rm arg}~{\rm min}}\limits_{v\in C} \langle v, \delta g - \beta q^k \rangle + \beta {\rm TV}(v) \nonumber \end{align*}$

with $\newcommand{\e}{{\rm e}} \newcommand{\bi}{\boldsymbol} g = \big((c_1 - u^{k+1}){}^2 , \dots, (c_\ell- u^{k+1}){}^2 \big){}^T$ . We now solve a variant of the primal-dual method [35] as suggested in [38, 39]. They consider the general problem including pointwise linear terms of the form

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \min_{x\in C} \max_{y\in B} \langle Kx, y \rangle + \langle g,x \rangle - \langle h, y\rangle, \nonumber \end{align*}$

where $C \subseteq X$ , $B\subseteq Y$ are closed, convex sets.

Setting $K=\nabla$ and h = 0, $\theta=1$ and step sizes $\sigma=\tau=0.99 / \| \nabla \|$ , the updates are

$\begin{align*} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle w^{n+1} &= \Pi_B \big(w^n+\sigma (\nabla \bar{v}^n-h)\big) \nonumber \\ v^{n+1} &= \Pi_C \big(v^n + \tau \nabla \cdot (v^{n+1} - \delta g + \beta q^k)\big)\nonumber \\ \bar{v}^{n+1} &= 2v^{n+1}-v^n. \nonumber \end{align*}$

At the end, we set $v^{k+1}=v^{n+1}$ and obtain the update q^k+1 as (11d).

3.1. Convergence analysis

The proposed joint approach (11) is an optimisation problem of the form

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \min_{u,v} E(u,v) + D_{J_1}^{\,p^k}(u,u^k)+ D_{J_2}^{q^k}(v,v^k) \label{eq:P} \nonumber \end{align} \tag{ 12 }$

in the general Bregman distance framework for (nonconvex) functions $E: \mathbb{R}^n \times \mathbb{R}^m \to$ $\mathbb{R} \cup \{\infty\}$ , for $k \in \{0, \dots, N\}$ and some positive parameters $\alpha$ and $\beta$ . The functions $J_1:\mathbb{R}^n \to \mathbb{R} \cup \{\infty \}$ and $J_2:\mathbb{R}^m \to \mathbb{R} \cup \{\infty \}$ impose some regularity in the solution. In this work we consider a finite dimensional setting and we refer to the next section for the required definitons. To prove global convergence of (12), we consider functions that satisfy the Kurdika–Łojasiewicz property, defined below, and we make the following assumptions.

Definition 1 (Kurdyka–Łojasiewicz (KL) property). Let $F: \mathbb{R}^d \to \mathbb{R}$ be a proper and lower semicontinuous function.

Then the function F is said to have the KL property at $\newcommand{\e}{{\rm e}} \bar{u}\in {{\rm dom}}(\partial F) := \{u \in \mathbb{R}^d | \partial F \neq \emptyset \}$ if there exists a constant $\newcommand{\e}{{\rm e}} \eta \in (0, \infty]$ , a neighbourhood N of $\bar{u}$ and a concave function $\newcommand{\e}{{\rm e}} \varphi: [0,\eta) \to \mathbb{R}_{>0}$ that is continuous at 0 and satisfies $\varphi(0)=0$ , $\newcommand{\e}{{\rm e}} \varphi \in C^1(]0,\eta[)$ , and $\varphi'(s)>0$ for all $\newcommand{\e}{{\rm e}} s\in]0,\eta [$ , such that for all $\newcommand{\e}{{\rm e}} u\in N \cap \{u \in \mathbb{R}^d | F(\bar{u}) < F(u)<F(\bar{u})+\eta\}$ the inequality
$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \varphi'(F(u) -F(\bar{u})) {{\rm dist}}(0,\partial F(u)) \geqslant 1 \hspace{+4cm} ({\rm KL}) \nonumber \end{align*}$
holds.
If F satisfies the KL property at each point of ${{\rm dom}}(\partial F)$ , F is called a KL function.

Lemma 1. The function $\newcommand{\e}{{\rm e}} E(u,v)=\frac{1}{2} \| Au - f\|_{2}^{2} + \delta \sum\nolimits_{i=1}^n \sum\nolimits_{j=1}^{\ell} v_{ij} (c_j - u_i){}^2$ in our joint problem (11) satisfies the KL property over $\mathbb{R}^n \times \mathbb{R}^m$ .

Proof. It has been proved in [40] that real-analytic functions satisfy the KL property. The function $E(u,v)$ is polynomial and therefore it is a real-analytic function. ▪

Assumption 1.

(i)
E is a C¹ function
(ii)
$E>-\infty$
(iii)
E is a KL function
(iv)
$J_i: \mathbb{R}^n \to \mathbb{R}$ , $i=1,2$ , are proper, lower semi-continuous (l.s.c.) and strongly convex
(v)
J_i, $i=1,2$ , are KL function
(vi)
for any fixed $v$ , the function $u \to E(u,v)$ is convex. Likewise for any fixed u, the function $v \to E(u,v)$ is convex.
(vii)
for any fixed $v$ , the function $u \to E(u,v)$ is $C^{1}_{L_1(v)}$ , hence the partial gradient is $L_1(v)$ -Lipschitz continuous
$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \| \nabla_u E(u_1,v) - \nabla_u E(u_2,v) \| \leqslant L_1(v) \| u_1 - u_2 \| \qquad \forall u_1,u_2 \in \mathbb{R}^n. \nonumber \end{align*}$
Likewise for any fixed u, the function $v \to E(u,v)$ is $C^{1}_{L_2(u)}$ .

We want to study the convergence properties of the alternating scheme

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle u^{k+1} = \mathop{{\rm arg}~{\rm min}}\limits_{u} \left\{E(u, v^k) + D_{J_1}^{\,p^k} (u,u^k) \right\} \label{eq:A1.1} \nonumber \end{align} \tag{ 13a }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle p^{k+1} = p^k - \nabla_u E(u^{k+1}, v^k) \label{eq:A1.2} \nonumber \end{align} \tag{ 13b }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle v^{k+1} = \mathop{{\rm arg}~{\rm min}}\limits_{v} \left\{E(u^{k+1}, v) + D_{J_2}^{q^k} (v,v^k) \right\}\label{eq:A1.3} \nonumber \end{align} \tag{ 13c }$

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle q^{k+1} = q^k - \nabla_v E(u^{k+1}, v^{k+1}) \label{eq:A1.4} \nonumber \end{align} \tag{ 13d }$

for initial values $(u^0,v^0)$ , $p^0 \in \partial J_1(u^0)$ and $q^0 \in \partial J_2(v^0)$ .

We want to show that the whole sequence generated by (13) converges to a critical point of E.

Algorithm 1. Alternating splitting method with Bregman iterations for two blocks.

Initialization: $(u^0,v^0)$

$(u^0,v^0)$ , $p^0 \in \partial J_1(u^0)$ $p^0 \in \partial J_1(u^0)$ , $q^0 \in \partial J_2(v^0)$ $q^0 \in \partial J_2(v^0)$ , $N\in \mathbb{N}$ $N\in \mathbb{N}$

for $k=0,1,\dots, N$ $k=0,1,\dots, N$ do

$\newcommand{\bi}{\boldsymbol} u^{k+1} = \mathop{{\rm arg}~{\rm min}}\limits_{u} \big\{E(u, v^k) + D_{J_1}^{\,p^k} (u,u^k) \big\}$ $\newcommand{\bi}{\boldsymbol} u^{k+1} = \mathop{{\rm arg}~{\rm min}}\limits_{u} \big\{E(u, v^k) + D_{J_1}^{\,p^k} (u,u^k) \big\}$

$p^{k+1} = p^k - \nabla_u E(u^{k+1}, v^k)$ $p^{k+1} = p^k - \nabla_u E(u^{k+1}, v^k)$

$\newcommand{\bi}{\boldsymbol} v^{k+1} = \mathop{{\rm arg}~{\rm min}}\limits_{v} \big\{E(u^{k+1}, v) + D_{J_2}^{q^k} (v,v^k) \big\}$ $\newcommand{\bi}{\boldsymbol} v^{k+1} = \mathop{{\rm arg}~{\rm min}}\limits_{v} \big\{E(u^{k+1}, v) + D_{J_2}^{q^k} (v,v^k) \big\}$

$q^{k+1} = q^k - \nabla_v E(u^{k+1}, v^{k+1})$ $q^{k+1} = q^k - \nabla_v E(u^{k+1}, v^{k+1})$

end for

In order for the updates (13a) and (13c) to exist, we want J to be of the form $J=R+\varepsilon G$ (e.g. $R = \| \nabla u \|_1$ and $G= \| u \|^2_2$ , see [41]) where R and G fulfil the following assumptions. In practice, we verify that G does not significantly change the reconstruction and segmentation performance for the examples we consider in the next section, for sufficiently small parameter (e.g. $\varepsilon=10^{-3}$ ). Therefore, in our model (11) and in the numerical results we omit it.

Assumption 2.

(i)
The functions $G_1:\mathbb{R}^n \to \mathbb{R}$ and $G_2:\mathbb{R}^m \to \mathbb{R}$ are strongly convex with constants $\gamma_1$ and $\gamma_2$ , respectively. They have Lipschitz continuous gradient $\nabla G_1$ and $\nabla G_2$ with Lipschitz constant $\delta_1$ and $\delta_2$ , respectively.
(ii)
The functions $R_1:\mathbb{R}^n \to \mathbb{R}$ and $R_2:\mathbb{R}^m \to \mathbb{R}$ are proper, l.s.c. and convex.

For $J_i=\alpha_i R_i+\varepsilon_i G_i$ , $i \in \{1,2\}$ , we can write (13) as

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle u^{k+1} = \mathop{{\rm arg}~{\rm min}}\limits_{u} \big\{E(u, v^k) + \alpha_1 D_{R_1}^{\,p^k} (u,u^k)+ \varepsilon_1 D_{G_1}(u,u^k) \big\} \label{eq:A2.1} \nonumber \end{align} \tag{ 14a }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle p^{k+1} = p^k - \frac{1}{\alpha_1} \Big(\nabla_u E(u^{k+1}, v^k) + \varepsilon_1 \big(\nabla G_1(u^{k+1})- \nabla G_1(u^k) \big) \Big) \label{eq:A2.2} \nonumber \end{align} \tag{ 14b }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle v^{k+1} = \mathop{{\rm arg}~{\rm min}}\limits_{v} \big\{E(u^{k+1}, v) + \alpha_2 D_{R_2}^{q^k} (v,v^k)+ \varepsilon_2 D_{G_2}(v,v^k)\big\}\label{eq:A2.3} \nonumber \end{align} \tag{ 14c }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle q^{k+1} = q^k - \frac{1}{\alpha_2} \Big(\nabla_v E(u^{k+1}, v^{k+1}) + \varepsilon_2 \big(\nabla G_2(v^{k+1})- \nabla G_2(v^k)\big) \Big). \label{eq:A2.4} \nonumber \end{align} \tag{ 14d }$

Theorem 1 (Global convergence). Suppose E is a KL function for any $z^k=(u^k,v^k) \in \mathbb{R}^n \times \mathbb{R}^m$ and $r^k=(\,p^k,q^k)$ with $p^k \in \partial R_1(u^k)$ , $q^k \in \partial R_2(v^k)$ . Assume assumptions 1 and 2 hold. Let $\{z^k \}_{k\in \mathbb{N}}$ and $\{r^k \}_{k\in \mathbb{N}}$ be sequences generated by (14), which are assumed to be bounded. Then

(i)
The sequence $\{z^k\}_{k \in \mathbb{N}}$ has finite length, that is
$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \sum_{k=0}^{\infty} \| z^{k+1} - z^k \| < \infty. \nonumber \end{align} \tag{ 15 }$
(ii)
The sequence $\{z^k\}_{k\in \mathbb{N}}$ converges to a critical point $\bar{z}$ of E.

3.2. Proof of theorem 1

In the following we are going to show global convergence of this algorithm. The first step in our convergence analysis is to show a sufficient decrease property of a surrogate of the energy function (12) and a subgradient bound of the norm of the iterates gap. We first recall the following definitions.

Definition 2 (Convex conjugate). Let G be a proper, l.s.c. and convex function. Then its convex conjugate $G^*:\mathbb{R}^n \to \mathbb{R} \cup \{\infty \}$ is defined as

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle G^*(\,p):= \sup_{u\in \mathbb{R}^n} \{\langle u,p\rangle -G(u) \}, \nonumber \end{align*}$

for all $p \in \mathbb{R}^n$ .

Lemma 2. Let G be a proper, l.s.c. and convex function and G^* its convex conjugate. Then for all arguments $u\in \mathbb{R}^n$ with corresponding subgradients $p\in \partial G(u)$ we know

$\langle u,p \rangle = G(u) +G^*(\,p)$ ,
$p\in \partial G(u)$ is equivalent to $u\in \partial G^*(\,p)$ .

From lemma 2 we can rewrite the Bregman distance in (3) as follows:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle D_J^{\,p^k}(u,u^k)=J(u)+J^*(\,p^k)-\langle u, p^k \rangle, \nonumber \end{align} \tag{ 16 }$

where we can see that now it does not depend on u^k anymore, but it can be defined as a function of u and p ^k only, $D_J(u,p^k)$ .

Definition 3 (Strong convexity). Let G be a proper, l.s.c. and convex function. Then G is said to be $\gamma$ -strongly convex if there exists a constant $\gamma$ such that

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle D_G^{\,p}(u,v) \geqslant \frac{\gamma}{2} \|u -v \|^2 \nonumber \end{align*}$

holds true for all $u,v \in {\rm dom}(G)$ and $q \in \partial G(v)$ .

Definition 4 (Symmetric Bregman distance). Let G be a proper, l.s.c. and convex function. Then the symmetric generalised Bregman distance $D_G^{\rm symm} (u,v)$ is defined as

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle D_G^{\rm symm} (u,v):= D_G^{\,p}(u,v) + D_G^q (v,u) = \langle\,p-q,u-v \rangle \nonumber \end{align*}$

for $u,v \in {\rm dom}(G)$ with $p\in \partial G(u)$ and $q \in \partial G(v)$ . We also observe that in case G is $\gamma$ -strongly convex we have

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle D_G^{\rm symm}(u,v) \geqslant \gamma\|u -v \|^2. \nonumber \end{align*}$

Definition 5 (Lipschitz continuity). A function $G: \mathbb{R}^n\to \mathbb{R}$ is (globally) Lipschitz-continuous if there exists a constant L > 0 such that

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \|G(u)-G(v)\| \leqslant L \| u -v\| \nonumber \end{align*}$

is satisfied for all $u,v \in \mathbb{R}^n$ .

Before we show global convergence, we first define the surrogate functions.

Definition 6 (Surrogate objective). Let $E, R_i, G_i, i\in\{1,2\}$ satisfy assumptions 1 and 2, respectively. For any $(u^k,v^k) \in \mathbb{R}^n \times \mathbb{R}^m$ and subgradients $p^k \in \partial R_1(u^k)$ and $q^k \in \partial R_2(v^k)$ , we define the following surrogate objectives F, F₁ and F₂

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \nonumber F(u^{k+1},v^{k+1},p^k, q^k) =E(u^{k+1},v^{k+1}) &+ \alpha_1\underbrace{\big(R_1(u^{k+1})+R_1^*(\,p^k)-\langle u^{k+1},p^k \rangle\big)}_{{= D_{R_1}^{\,p^{k}}(u^{k+1},u^{k})}}\nonumber \\ &+ \alpha_2 \underbrace{\big(R_2(v^{k+1})+R_2^*(q^k)-\langle v^{k+1},q^k \rangle\big)}_{{= D_{R_2}^{q^{k}}(v^{k+1},v^{k})}}, \label{eq:surr} \nonumber \end{align} \tag{ 17 }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle F_1(u^{k+1},p^k) =E(u^{k+1},v^{k+1}) + \alpha_1\big(R_1(u^{k+1})+R_1^*(\,p^k)-\langle u^{k+1},p^k \rangle \big), \nonumber \end{align} \tag{ 18 }$

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle F_2(v^{k+1},q^k) =E(u^{k+1},v^{k+1}) + \alpha_2 \big(R_2(v^{k+1})+R_2^*(q^k)-\langle v^{k+1},q^k \rangle \big). \nonumber \end{align} \tag{ 19 }$

For convenience we will use the following notations:

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle z^k:=&(u^k,v^k) \qquad \forall k \geqslant 0 \nonumber \\ r^k:=&(\,p^k,q^k) \qquad p^k \in \partial R_1(u^k), \: q^k \in \partial R_2(v^k). \nonumber \end{align*}$

The surrogate function F will then read

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle F(z^{k+1},r^k)=F(u^{k+1},v^{k+1},p^k, q^k). \nonumber \end{align*}$

We can now show the sufficient decrease property of (17) for subsequent iterates.

Lemma 3 (Sufficient decrease property). The iterates generated by (14) satisfy the descent estimate

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle F(z^{k+1},r^k) + \rho_2 \|z^{k+1}-z^k\|^2 \leqslant F(z^{k},r^{k-1}). \label{eq:decrease} \nonumber \end{align} \tag{ 20 }$

In addition we observe

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \lim_{k \to \infty} D_{R_1}^{\rm symm} (u^{k+1},u^k) = 0 \qquad \lim_{k \to \infty} D_{R_2}^{\rm symm}(v^{k+1},v^k)=0 \nonumber \\ \lim_{k \to \infty} D_{G_1}^{\rm symm} (u^{k+1},u^k) =0 \qquad \lim_{k \to \infty} D_{G_2}^{\rm symm}(v^{k+1},v^k)=0. \nonumber \end{align*}$

Proof. From (12) we consider the following step for $J_1=\alpha_1R_1+\varepsilon_1G_1$ :

$\begin{align*} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle u^{k+1} =& \mathop{{\rm arg}~{\rm min}}\limits_{u} \big\{E(u, v^k) + \alpha_1 D_{R_1}^{\,p^k} (u,u^k) +\varepsilon_1 D_{G_1}(u,u^k) \big\} \nonumber \\ =& \mathop{{\rm arg}~{\rm min}}\limits_{u} \big\{E(u, v^k) + \alpha_1 R(u) + \varepsilon_1 G(u) - \langle \alpha_1 p^k +\varepsilon_1 \nabla G(u^k) , u -u^k\rangle \big\}. \nonumber \end{align*}$

Computing the optimality condition we obtain

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \alpha_1 (\,p^{k+1} - p^k) + \nabla_u E (u^{k+1},v^k) + \varepsilon_1 (\nabla G(u^{k+1})- \nabla G(u^k))=0. \nonumber \end{align*}$

Taking the dual product with $u^{k+1} - u^k$ yields

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \alpha_1 \underbrace{\langle\,p^{k+1} - p^k,u^{k+1} - u^k \rangle}_{{=D_{R_1}^{\rm symm}(u^{k+1},u^k)}} + \underbrace{\langle \nabla_u E (u^{k+1},v^k),u^{k+1} - u^k \rangle}_{{\geqslant E(u^{k+1}, v^k) -E(u^{k}, v^k)}} \nonumber \\ + \varepsilon_1 \underbrace{\langle \nabla G_1(u^{k+1})- \nabla G_1(u^k),u^{k+1} - u^k \rangle}_{{=D_{G_1}^{\rm symm}(u^{k+1},u^k)}}=0. \nonumber \end{align*}$

Using the convexity estimate $E(u^{k+1},v^k) - E(u^k,v^k) \leqslant - \langle \nabla_u E (u^{k+1},v^k),u^{k+1} - u^k \rangle$ we obtain the inequality

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \alpha_1 D_{R_1}^{\rm symm}(u^{k+1},u^k) + \varepsilon_1 D_{G_1}^{\rm symm}(u^{k+1},u^k) + E(u^{k+1}, v^k) -E(u^{k}, v^k) \leqslant 0 \nonumber \end{align*}$

$\begin{align*} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \alpha_1 \big(D_{R_1}^{\,p^k}(u^{k+1},u^k) +D_{R_1}^{\,p^{k+1}}(u^{k},u^{k+1})\big) + \varepsilon_1 D_{G_1}^{\rm symm}(u^{k+1},u^k) + E(u^{k+1},v^k) \nonumber \\ \leqslant E(u^{k},v^k). \nonumber \end{align*}$

Adding $\alpha_1 D_{R_1}^{\,p^{k-1}}(u^{k},u^{k-1})$ to both sides, using the strong convexity of G₁ and the surrogate function notation, we get

$\begin{align*} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle F_1(u^{k+1},p^k)+ \alpha_1 \big(D_{R_1}^{\,p^{k+1}}(u^{k},u^{k+1}) + D_{R_1}^{\,p^{k-1}}(u^{k},u^{k-1})\big) + \varepsilon_1 \gamma_1 \|u^{k+1} -u^k\|^2 \leqslant F_1(u^{k},p^{k-1}). \nonumber \end{align*}$

Using the trivial estimate for the Bregman distances, we get the decrease property

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle F_1(u^{k+1},p^k)+ \varepsilon_1 \gamma_1 \|u^{k+1} -u^k\|^2 \leqslant F_1(u^{k},p^{k-1}). \nonumber \end{align*}$

Similarly for $v$ , we obtain

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle F_2(v^{k+1},q^{k})+ \varepsilon_2 \gamma_2 \|v^{k+1} -v^k \|^2 \leqslant F_2(v^{k},q^{k-1}). \nonumber \end{align*}$

Summing up these estimates, we verify the sufficient decrease property (20), with positive $\rho_2 = \max \{\varepsilon_1 \gamma_1, \varepsilon_2 \gamma_2 \}$ . We also observe

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle 0 \leqslant \Delta^k \leqslant E(z^k) -E(z^{k+1}) \nonumber \end{align*}$

with

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \Delta^k := \alpha_1 D_{R_1}^{\rm symm} (u^{k+1},u^k) + \alpha_2 D_{R_2}^{\rm symm}(v^{k+1},v^k) + \varepsilon_1 D_{G_1}^{\rm symm} (u^{k+1},u^k)+ \varepsilon_2 D_{G_2}^{\rm symm}(v^{k+1},v^k). \nonumber \end{align*}$

Summing over $k=0,\dots,N$ :

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \sum_{k=0}^{N} \Delta^k \leqslant \sum_{k=0}^{N} E(z^k) -E(z^{k+1}) = E(z^0) - E(z^{N+1}) \leqslant E(z^0) - \inf_z E(z) < \infty. \nonumber \end{align*}$

Taking the limit $N \to \infty$ implies

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \sum_{k=0}^{\infty} \Delta^k < \infty \nonumber \end{align*}$

thus $\lim\nolimits_{k \to \infty} D_{R_1}^{\rm symm} (u^{k+1},u^k) = 0$ , $\lim\nolimits_{k \to \infty} D_{G_1}^{\rm symm}=0$ , $\lim\nolimits_{k \to \infty} D_{R_2}^{\rm symm}(v^{k+1},v^k)=0$ ,

$\lim\nolimits_{k \to \infty}D_{G_2}^{\rm symm}(v^{k+1},v^k)=0$ , due to $\alpha_1$ , $\alpha_2$ , $\varepsilon_1$ , $\varepsilon_2 >0$ . ▪

In order to show that the sequences generated by (14) approach the set of critical point we first estimate a bound for the subgradients of the surrogate functions and verify some properties of the limit point set. We first write the subdifferential of the surrogate function as

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle w^{k+1}:=\left(\begin{array}{@{}c@{}} \nabla_u E (u^{k+1},v^{k+1}) + \alpha_1 (\,p^{k+1}-p^k) \nonumber \\ \nabla_v E (u^{k+1},v^{k+1})+ \alpha_2 (q^{k+1}-q^k) \nonumber \\ u^k-u^{k+1} \nonumber \\ v^k-v^{k+1} \end{array}\right) \in \partial F(z^{k+1},r^k), \label{eq:w} \nonumber \end{align} \tag{ 21 }$

with $p^k\in\partial R_1(u^k)$ and $q^k \in \partial R_2(v^k)$ being equivalent to $u^k \in \partial R_1^* (\,p^k)$ and $v^k \in \partial R_2^* (q^k)$ , respectively.

Lemma 4 (A subgradient lower bound for the iterates gap). Suppose assumptions 1 and 2 hold. Then the iterates (14) satisfy

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \|w^{k+1}\| \leqslant \rho_1 \| z^{k+1} - z^k \| \label{eq:bound} \nonumber \end{align} \tag{ 22 }$

$w^{k+1}\in \partial F(z^{k+1},r^k)$ as defined in (21) and $\rho_1 = \max \{1+\varepsilon_1 \delta_1, 1+\varepsilon_2 \delta_2 + L_2 \}$ .

Proof. From (21) we know

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \|w^{k+1}\|& \leqslant \| \nabla_u E (u^{k+1},v^{k+1}) +\alpha_1 (\,p^{k+1}-p^k)\|+\|\nabla_v E (u^{k+1},v^{k+1}) +\alpha_2 (q^{k+1}-q^k)\| \nonumber \\ & +\| u^k-u^{k+1}\| +\| v^k-v^{k+1}\|. \nonumber \end{align*}$

From the optimality conditons of (14b) and (14d), we compute

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \|w^{k+1}\|& \leqslant \| \nabla_u E (u^{k+1},v^{k+1}) +\alpha_1 (\,p^{k+1}-p^k)\|+\|\nabla_v E (u^{k+1},v^{k+1}) +\alpha_2 (q^{k+1}-q^k)\| \nonumber \\ & +\| u^k-u^{k+1}\| +\| v^k-v^{k+1}\|\nonumber \\ &= \varepsilon_1 \underbrace{\| \nabla G_1 (u^{k+1}) - \nabla G_1 (u^k) \|}_{\leqslant \delta_1 \|u^{k+1}-u^k \|} + \underbrace{\|\nabla_u E (u^{k+1},v^{k+1}) - \nabla_u E (u^{k+1},v^k) \|}_{\leqslant L_2 \|v^{k+1}-v^k \|} \nonumber \\ &+ \varepsilon_2 \underbrace{\| \nabla G_2 (v^{k+1}) - \nabla G_2 (v^k) \|}_{\leqslant \delta_2 \|v^{k+1}-v^k \|} + \|u^{k+1}-u^k \| + \|v^{k+1}-v^k \| \nonumber \\ & \leqslant (1+\varepsilon_1 \delta_1) \|u^{k+1}-u^k \| + (1+\varepsilon_2 \delta_2 + L_2) \|v^{k+1}-v^k \| \nonumber \\ &\leqslant \rho_1 \|z^{k+1}-z^k\|, \nonumber \end{align*}$

with $\rho_1 = \max \{1+\varepsilon_1 \delta_1, 1+\varepsilon_2 \delta_2 + L_2 \}$ . Here we used the Lipschitz-continuity of $\nabla G_i$ and $\nabla E$ . ▪

Following [41, 42], we verify some properties of the limit point set. Let $\{z^k \}_{k\in \mathbb{N}}$ and $\{r^k \}_{k\in \mathbb{N}}$ be sequences generated by (14). The set of limit points is defined as

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \omega(z^0,r^0) := \Big\{&(\bar{z},\bar{r}) \in \mathbb{R}^n \times \mathbb{R}^n : \exists {{\rm ~an~increasing~sequence~of~integers~}} \{k_j \}_{j \in \mathbb{N}}\nonumber \\ &{{\rm such~that}} \lim_{j \to \infty} z^{k_j} = \bar{z}{{\rm ~and~}} \lim_{j \to \infty} r^{k_j} = \bar{r} \Big\}. \nonumber \end{align*}$

As in [41, definition 5.4, proposition 5.5], we are going to assume that R_i, $i=1,2$ has locally bounded subgradients.

Lemma 5. Suppose assumptions 1 and 2 hold. Let $\{z^k\}_{k \in \mathbb{N}}$ be a sequence generated by (14) which is assumed to be bounded. Let $(\bar{z},\bar{r}) \in \omega(z^0,r^0)$ . Then the following assertion holds:

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \lim_{k\to \infty} F(z^{k+1},r^k) = F(\bar{z},\bar{r}) =E(\bar{z}). \label{eq:crit} \nonumber \end{align} \tag{ 23 }$

Proof. Since $(\bar{z},\bar{r})$ is a limit point of $\{(z^k,r^{k})\}_{k \in \mathbb{N}}$ , $\{(z^k,r^{k})\}_{k \in \mathbb{N}}$ , there exist subsequences $\{z^{k_j}\}_{j \in \mathbb{N}}$ and $\{r^{k_j}\}_{j \in \mathbb{N}}$ such that $\lim\nolimits_{j \to \infty} z^{k_j} = \bar{z}$ and $\lim\nolimits_{j \to \infty} r^{k_j} = \bar{r}$ , respectively. We immediately obtain

$\begin{align*} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \lim_{j \to \infty} F(z^{k_j},r^{k_j-1}) &= \lim_{j \to \infty} \big\{E(z^{k_j}) +\alpha_1 D_{R_1}^{\,p^{k_j-1}}(u^{k_j},u^{k_j-1}) + \alpha_2 D_{R_2}^{q^{k_j-1}}(v^{k_j},v^{k_j-1}) \big\}\nonumber \\ &= E(\bar{z}) \nonumber \end{align*}$

due to the continuity of E and $\lim\nolimits_{j \to \infty} D_{R_1}^{\,p^{k_j-1}}(u^{k_j},u^{{k_j}-1}) =0$ and $\lim\nolimits_{j \to \infty} D_{R_2}^{q^{k_j-1}}(v^{k_j},v^{k_j-1}) =0$ . From the sufficient decrease property we conclude (23). ▪

Lemma 6 (Properties of limit point set). The limit point set w(z⁰) is a non empty, compact and connected set, the objective function E is constant on w(z⁰) and we have $\lim\nolimits_{k \to \infty} {{\rm dist}} (z^k, w(z^0)) = 0$ .

Proof. This follows steps as in [42, lemma 5]. ▪

To finally prove global convergence of (14), we will use the following Kurdyka–Łojasiewicz property defined and the result from [42]. Before recalling the definition, we introduce the notion of distance between any subset $S \subset \mathbb{R}^d$ and any point $x \in \mathbb{R}^d$ defined as

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle {{\rm dist}}(x,S):= \left\{\begin{array}{@{}lll@{}} \inf \{\| y - x\| : y\in S\} \qquad &S \neq \emptyset \nonumber \\ \infty &S=\emptyset \end{array}\right., \nonumber \end{align*}$

where $\| \cdot \|$ denotes the Euclidean norm.

Lemma 7 (Uniformised KL property). Let $\Omega$ be a compact set and let $E: \mathbb{R}^n \times \mathbb{R}^m \to \mathbb{R} \cup \{\infty \}$ be a proper and l.s.c. function. Assume that E is constant on $\Omega$ and satisfy the KL property at each point in $\Omega$ . Then there exists $\varepsilon>0$ , $\newcommand{\e}{{\rm e}} \eta>0$ and $\newcommand{\e}{{\rm e}} \varphi \in C^1((0,\eta))$ that satisfies the same conditions as in Definition KL, such that for all $\bar{u}\in \Omega$ and all u in

$\begin{align} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \{u \in \mathbb{R}^n \,\big\vert \, {{\rm dist}}(u,\Omega) < \varepsilon \} \cap \{u \in \mathbb{R}^n \,\big\vert \, E(\bar{z}) < E(z)< E(z) + \eta \} \nonumber \end{align} \tag{ 24 }$

condition KL is satisfied.

Proof. Follows from [42]. ▪

With these results we can now show global convergence of (14).

Proof of theorem 1. By the boundedness assumption on $\{(z^k,r^{k})\}_{k \in \mathbb{N}}$ , there exist converging subsequences $\{z^{k_j}\}_{j \in \mathbb{N}}$ and $\{r^{k_j}\}_{j \in \mathbb{N}}$ such that $\lim\nolimits_{j \to \infty} z^{k_j} = \bar{z}$ and $\lim\nolimits_{j \to \infty} r^{k_j} = \bar{r}$ , respectively. We know from lemma 5 that (23) is satisfied.

(i)
KL property holds for E and therefore for E^k and we write
$\begin{align*} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \varphi'\big(F(z^{k},r^{k-1}) - E(\bar{z}) \big) {{\rm dist}}\big(0,\partial F(z^{k},r^{k-1}) \big) \geqslant 1. \nonumber \end{align*}$
From lemma 4 we obtain
$\begin{align*} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \varphi'\big(F(z^{k},r^{k-1}) - E(\bar{z}) \big) \geqslant \rho_1^{-1} \| z^{k} - z^{k-1} \| ^{-1}, \nonumber \end{align*}$
and from the concavity of $\varphi$ we know that
$\begin{align*} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \varphi\big(F(z^{k},r^{k-1}) - E(\bar{z}) \big) - \varphi\big(F(z^{k+1},r^{k}) - E(\bar{z})\big) \nonumber \\ \geqslant \varphi' \big(F(z^{k},r^{k-1}) - E(\bar{z}) \big) \big(F(z^{k},r^{k-1}) - F(z^{k+1},r^{k}) \big). \nonumber \end{align*}$
Thus, we obtain
$\begin{align*} \newcommand{\bi}{\boldsymbol} \newcommand{\e}{{\rm e}} \displaystyle \frac{\varphi \big(F(z^{k},r^{k-1}) - E(\bar{z}) \big) - \varphi\big(F(z^{k},r^{k-1}) - E(\bar{z}) \big)}{F(z^{k},r^{k-1}) - F(z^{k},r^{k-1})} \geqslant \rho_{1}^{-1}\| z^{k} - z^{k-1} \| ^{-1}. \nonumber \end{align*}$
From (20) with lemma 3 and using the abbreviation
$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \varphi^k:=\varphi(F(z^{k},r^{k-1}) - E(\bar{z})), \nonumber \end{align*}$
it follows
$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \frac{\| z^{k+1} - z^k \|^2} {\| z^{k} - z^{k-1} \|} \leqslant \frac{\rho_1}{\rho_2} (\varphi^k-\varphi^{k+1}). \nonumber \end{align*}$
Multiplying by $\| z^{k} - z^{k-1} \|$ and using Young's inequality ( $2 \sqrt{ab} \leqslant a+ b$ ):
$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle 2 \| z^{k+1} - z^k \| \leqslant \frac{\rho_1} {\rho_2} (\varphi^k-\varphi^{k+1}) + \| z^{k} - z^{k-1} \|. \nonumber \end{align*}$
Summing up from $k=1,\dots, N$ we get
$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \sum_{k=1}^{N} \| z^{k+1} - z^k \| &\leqslant \frac{\rho_1}{\rho_2}(\varphi^1-\varphi^{N+1}) + \|z^1 - z^0 \| + \|z^{N+1} - z^N \|\nonumber \\ &\leqslant \frac{\rho_1}{\rho_2}\varphi^1+ \|z^1 - z^0 \|< \infty. \nonumber \end{align*}$
In addition we observe that the finite length property implies that the sequence $\{z^k\}_{k\in \mathbb{N}}$ is a Cauchy sequence and hence is a convergent sequence. For each z^r and z^s with s > r > l we have
$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle \| z^r - z^s \| = \| \sum_{k=r}^{s-1} z^{k+1} - z^k \| \leqslant \sum_{k=r}^{s-1} \| z^{k+1} - z^k \|. \nonumber \end{align*}$
(ii)
The proof follows in a similar fashion as in [41, lemma 5.9].

▪

Remark 2 (Extension to d blocks). The analysis described above holds for the general setting of d blocks

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle \min_{\{u_1,\dots,u_d\}} E(u_1, \dots, u_d) + \sum_{i=1}^{n} \alpha_i^k D_{J_i}^{\,p_i^k}(u_i,u_i^k). \label{eq:Pblocks} \nonumber \end{align} \tag{ 25 }$

The update for each of the d blocks then reads

$\begin{align*} \newcommand{\e}{{\rm e}} \displaystyle u_i^{k+1} &= \mathop{{\rm arg}~{\rm min}}\limits_{u_i} \left\{E(u_1^{k+1},u_2^{k+1},\dots,u_{i-1}^{k+1},u_i^{k},u_{i+1}^{k}, \dots, u_d^k) + \alpha_i D_{J_i}^{\,p_i^k}(u_i,u_i^k) \right\} \nonumber \\ p_i^{k+1} &= p_i^k - \frac{1}{\alpha_i} \Big(\nabla_{u_i} E(u_1^{k+1},u_2^{k+1},\dots,u_{i-1}^{k+1},u_i^{k+1},u_{i+1}^{k}, \dots, u_d^k) \Big). \nonumber \end{align*}$

4. Numerical results

In this section we present numerical results for our joint reconstruction and segmentation model described in (11). We demonstrate its advantages and limitations, as well as a discussion on the parameter choice. In the first part, we focus on bubbly flow segmentation for simulated data. In the second part, we show results for real data acquired at the Cancer Research UK, Cambridge Institute, for tumour segmentation.

Quality measure. To assess the performance of the reconstruction we will compare our solutions u with respect to the groundtruth u^gt. As quality measure we use the relative reconstuction error (RRE) and the peak signal to noise ratio (PSNR) defined as

${\rm RRE}(u,u^{gt})=\| u^{gt} - u\|_2 / \| u^{gt} \|_2$
${\rm PSNR}(u,u^{gt})= 10 \log_{10} \left(\frac{\max(u)}{\| u^{gt} - u\|_2 /N}\right)$

For the segmentation quality, we will use the relative segmentation error (RSE) to compare our segmentations $v$ with respect to the true segmentations $v^{gt}$

${\rm RSE}(v, v^{gt})= \frac{1}{N} \sum\nolimits_{i=1}^N \delta_{v^{gt}_i, v_i}$

where N is the number of pixels in the image, $\delta$ is the Kronecker delta function that will count the number of mis-classified pixels.

Before we present our two applications, we show a more detailed result of the phantom brain in figure 1. In this example, we show the TV reconstruction figure 2(b), where the parameter $\alpha$ has been optimised with respect to PSNR and its sequential segmentation figure 2(f) with optimal $\beta$ with respect to RSE. In figures 2(c) and (g) we present Bregman reconstruction and sequential segmentation where the Bregman iteration has been stopped according to the discrepancy principle equation (7) and $\beta$ has been optimised with respect to RSE. These parameter choices for the sequential approaches will be used in the whole paper.

**Figure 2.** We consider 15% of the simulated k-space for the brain phantom, where Gaussian noise ( $\sigma=0.25$ ) was added. We compare results for the total variation reconstruction and total-variation-based Bregman iterative reconstruction and their segmentation in a sequential approach with our joint model. We show that both reconstruction and segmentation are improved. (a) Groundtruth. (b) TV reconstruction, $\alpha=0.2$ , RRE = 0.046, PSNR = 24.87. (c) Bregman reconstruction, $\alpha=1$ , RRE = 0.044, PSNR = 24.98. (d) Joint reconstruction, $\alpha=0.8$ , RRE = 0.036, PSNR = 26.04. (e) Sampling matrix, 15%. (f) Segmentation, $\beta=0.001$ RSE = 0.061. (g) Bregman segmentation, $\beta=0.001$ , RSE = 0.065. (h) Joint segmentation, $\beta=0.001$ , $\delta=0.01$ RSE = 0.057.
Download figure:
Standard image High-resolution image

**Figure 2.** We consider 15% of the simulated k-space for the brain phantom, where Gaussian noise ( $\sigma=0.25$ ) was added. We compare results for the total variation reconstruction and total-variation-based Bregman iterative reconstruction and their segmentation in a sequential approach with our joint model. We show that both reconstruction and segmentation are improved. (a) Groundtruth. (b) TV reconstruction, $\alpha=0.2$ , RRE = 0.046, PSNR = 24.87. (c) Bregman reconstruction, $\alpha=1$ , RRE = 0.044, PSNR = 24.98. (d) Joint reconstruction, $\alpha=0.8$ , RRE = 0.036, PSNR = 26.04. (e) Sampling matrix, 15%. (f) Segmentation, $\beta=0.001$ RSE = 0.061. (g) Bregman segmentation, $\beta=0.001$ , RSE = 0.065. (h) Joint segmentation, $\beta=0.001$ , $\delta=0.01$ RSE = 0.057.
Download figure:
Standard image High-resolution image

In this first result, we clearly see that the joint approach performs much better compared to the separate steps in figures 2(b), (f) and (c), (g). Both reconstruction and segmentation are improved and more details are recovered. We refer to Appendix A for more simulated examples (see figures A1–A3).

4.1. Bubbly flow

The first application considered is the characterisation of bubbly flows using MRI. Bubbly flows are two-phase flow systems of liquid and gas trapped in bubbles, which are common in industrial applications such as bioreactors [43] and hydrocarbon processing units [44]. MRI has been successfully used to characterise the bubble size distribution [45, 46] and the liquid velocity field of bubbly flows [47, 48]; these properties govern the heat and mass transfer between the bubbles and the liquid which ultimately determine the efficiency of these industrial systems. However, when studying fast flowing systems, the acquisition time for fully sample k-space is too long to resolve the temporal changes; the most common method of breaking the temporal resolution barrier is through under-sampling. It is therefore critical to develop reconstruction techniques for highly under-sampled k-space data for the accurate reconstruction of the MRI images which would be subsequently used in calculating the bubble size distribution or in studying the hydrodynamics of the system.

We apply our joint reconstruction and segmentation approach to simulated bubbly flow imaging. In figure 3 we present some results for synthetic data, where figure 3(a) represents the groundtruth magnitude image, from which we simulate its k-space following the forward model described in (1). From the full k-space we collect 8% of the samples using the sampling matrix in figure 3(e) and we corrupt the data with Gaussian noise of standard deviation $\sigma = 0.35$ . In figures 3(b) and (f) we show the results for the total variation regularised reconstruction and its segmentation performed sequentially. In the same sequential way, we show the results for the Bregman iterative regularization in figures 3(c) and (g). In the last column in figures 3(d) and (h), we finally show the results for our joint approach. Although the TV and the Bregman approaches are already quite good, we can see that both RRE and PSNR are improved using our model in the reconstruction and the segmentation. Smaller details, such as the top right bubble contour, are better detected when solving the joint problem. As the goal of the bubbly flow application is to detect bubble size distribution, this improvement is really advantageous.

**Figure 3.** Results of the ${\rm TV}$ reconstruction and Bregman iterative reconstruction and their segmentation in the sequential approach are compared with our joint model. Both MSE and SSIM are improved in the joint approach. The data was corrupted with Gaussian noise with $\sigma=0.35$ . (a) Groundtruth. (b) ${\rm TV}$ reconstruction, $\alpha=0.1$ , RRE = 0.081, PSNR = 18.42. (c) Bregman reconstruction, $\alpha=2$ , RRE = 0.069, PSNR = 18.83. (d) Joint reconstruction, $\alpha=0.8$ , RRE = 0.058, PSNR = 20.7105. (e) Sampling matrix, 8%. (f) Segmentation, $\beta=0.001$ , RSE = 0.0093. (g) Bregman segmentation, $\beta=0.001$ , RSE = 0.017. (h) Joint segmentation, $\beta=0.001$ , $\delta=1$ , RSE = 0.0102.
Download figure:
Standard image High-resolution image

We tested the robustness of our approach by corrupting the data with different signal to noise ratio (SNR) and by considering different amount of sampling. In figure 5 we show in the top row the reconstructions obtained with the joint model for different SNR (which corresponds to different standard deviation $\sigma$ ) and in the bottom row the corresponding segmentation obtained by the joint approach. To complement this information, we show in figure 6 how the PSNR, RRE and RSE are affected, for the joint approach (blue lines) and for the separate approaches, TV (red dotted lines) and Bregman TV (green dotted lines). As expected, with the SNR increasing the error decreases. We can see that the joint approach performs better than the sequential approach for any SNR. The improvement is even more significant for very noisy data. As in practice we often observe high levels of noise, the joint approach is able to takle this problem better than the traditional sequential approaches.

**Figure 4.** Top row: Reconstructions obtained by the joint model with different SNR. Bottom row: Corresponding segmentations. (a) SNR = 10.56, $\sigma = 0.70$ . (b) SNR = 12.69, $\sigma = 0.56$ . (c) SNR = 16.68, $\sigma = 0.35$ . (d) SNR = 32.83, $\sigma = 0.06$ .
Download figure:
Standard image High-resolution image

**Figure 4.** Top row: Reconstructions obtained by the joint model with different SNR. Bottom row: Corresponding segmentations. (a) SNR = 10.56, $\sigma = 0.70$ . (b) SNR = 12.69, $\sigma = 0.56$ . (c) SNR = 16.68, $\sigma = 0.35$ . (d) SNR = 32.83, $\sigma = 0.06$ .
Download figure:
Standard image High-resolution image

**Figure 5.** Top row: Reconstructions obtained by the joint model with different sampling rates. Bottom row: Corresponding segmentations. The joint reconstruction and segmentation is able to detect the main structures down to 5% of the samples. Up to 2% the results are less clean but still acceptable. Using only 1% of the data is not enough to produce the image and segmentation. (a) 25%. (b) 12.5%. (c) 8%. (d) 5%. (e) 3%. (f) 2%. (g) 1%.
Download figure:
Standard image High-resolution image

**Figure 6.** Error plots for different SNR. From left to right, we show the PSNR, RRE and RSE, respectively, for different levels of noise in the measurements. The blue lines represent the error for our joint approach, while the red and green dotted lines are for the sequential TV and sequential Bregman TV approaches. For each SNR, the joint model performs better than the separate methods. This improvement is even more significant for noisier data, which is highly advantageous as in practice we often observe lower SNR. (a) PSNR. (b) RRE. (c) RSE.
Download figure:
Standard image High-resolution image

It is also interesting to investigate how the joint approach performs with very low undersampling rates. In figure 3(e) we show joint reconstructions (top row) and corresponding segmentations (bottom row) for decreasing sampling rates. We can see that up to 5% results are still very good. Using 3 and 2% of the samples the results are less clean but it is possible to identify the main structures. In contrast, 1% sampling is not enough to retrieve a good image reconstruction and consequently its segmentation. In figure 7, we plot PSNR, RRE and RSE for different sampling rates. The blue lines represent the error for our joint approach, while the red and green dotted lines are for the sequential TV and sequential Bregman TV approaches. We can see that up to 5% sampling the error measures do not change significantly. However, for lower rates, the improvement is more significant. This is highly beneficial for the bubbly flow application as increasing the temporal resolution is really important to keep track of the gas flowing in the pipe.

**Figure 7.** Error plots varying sampling rate. From left to right, we show the PSNR, RRE and RSE, respectively, for different levels of noise in the measurements. The blue lines represent the error for our joint approach, while the red and green dotted lines are for the sequential TV and sequential Bregman TV approaches. The joint appraoch performs better than the sequential cases. The gain is not very significant for higher sampling rates, but it becomes more important for lower rates, starting from 3%. (a) PSNR. (b) RRE. (c) RSE.
Download figure:
Standard image High-resolution image

4.2. Cancer imaging

In this subsection, we illustrate the performance of the joint model for real cancer data. At the Cancer Research UK, Cambridge Institute, researchers acquire every day a huge amount of MRI scans to assess tumour progression and response to therapy [49]. For this reason, it is very convenient to have fast sampling through compressed sensing, and automatic segmentation methods. Furthermore, reconstructions with enhanced edges are advantageous to facilitate clinical analysis.

Here we show our results for MRI data of a rat bearing a glioblastoma. The MR image represents the rat head where the brain is the gray area in the top half of the image. Inside this gray region, a tumour is clearly visible appearing as a brighter area. For this experiment, we acquired the full k-space and present the zero-filling reconstruction in figure 8(a) and the sequential segmentation in figure 8(e). As discussed already in the previous section, the zero-filled reconstruction presents noise and artefact which may complicate the segmentation. We want to show that the compressed sensing approach and in particular the joint model can improve this reconstruction. Given the full k-space, we select 15% of the samples using a spiral mask. In figures 8(b), (f) and figures 8(c), (g) we show the results for the sequential approaches. In figures 8(d) and (h) we show the joint reconstruction and the joint segmentation obtained for the same data. The regularised approaches perform better that the zero-filled reconstruction, producing less noisy results. However, our joint model is able to produce a cleaner reconstruction where the edges that defines the tumour and the brain are very well detected. In figure 9, we show a zoomed section where it is easy to assess that the joint model tackle the noise and detect the region of interest. We can see that we are able to improve the reconstruction and automatically identify the tumour in the brain. The degree of enhancement of the edges in the reconstruction is controllable by the parameter $\delta$ in the model (11). In the next subsection we present a discussion on how to tune this parameter.

**Figure 8.** Reconstructions and segmentation for real MRI data. We select 15% of the samples using a spiral mask. The image show a rat brain bearing a tumour (brighter region). The zero-filled reconstruction (a) and the TV regularised reconstruction (b) are shown together with their sequential segmentation (e) and (f) respectively. In the last column (d) and (h) we show the results for our model. The parameter $\alpha$ for the TV reconstruction and for the joint reconstruction has been chosen such that it achieves visually optimal in the sense that it resolve all the details (e.g. the darker line cutting the tumour transversally). (a) Zero-filled reconstruction. (b) TV reconstruction $\alpha=0.01$ . (c) Bregman reconstruction $\alpha=1$ . (d) Joint reconstruction $\alpha=0.5$ . (e) Segmentation. (f) Segmentation $\beta=0.07$ . (g) Bregman segmentation $\beta=0.07$ . (h) Joint segmentation $\beta=0.01$ , $\delta=0.01$ .
Download figure:
Standard image High-resolution image

**Figure 8.** Reconstructions and segmentation for real MRI data. We select 15% of the samples using a spiral mask. The image show a rat brain bearing a tumour (brighter region). The zero-filled reconstruction (a) and the TV regularised reconstruction (b) are shown together with their sequential segmentation (e) and (f) respectively. In the last column (d) and (h) we show the results for our model. The parameter $\alpha$ for the TV reconstruction and for the joint reconstruction has been chosen such that it achieves visually optimal in the sense that it resolve all the details (e.g. the darker line cutting the tumour transversally). (a) Zero-filled reconstruction. (b) TV reconstruction $\alpha=0.01$ . (c) Bregman reconstruction $\alpha=1$ . (d) Joint reconstruction $\alpha=0.5$ . (e) Segmentation. (f) Segmentation $\beta=0.07$ . (g) Bregman segmentation $\beta=0.07$ . (h) Joint segmentation $\beta=0.01$ , $\delta=0.01$ .
Download figure:
Standard image High-resolution image

**Figure 9.** Zoomed section on the tumor for the different approaches. (a) Zero-filled reconstruction. (b) TV reconstruction. (c) Bregman reconstruction. (d) Joint reconstruction. (e) Segmentation. (f) Segmentation. (g) Bregman segmentation. (h) Joint segmentation.
Download figure:
Standard image High-resolution image

4.3. Parameter choice rule

In the model proposed in (11), the parameters that we need to choose are $\alpha$ , $\beta$ and $\delta$ . In this section we discuss a rule to choose them depending on the desired results. Some examples will clarify these empirical choices.

$\alpha$ balances the total variation regularization term in the reconstruction for the magnitude. The higher the $\alpha$ , the more piecewise constant the reconstruction will be. See figure 10.
$\beta$ defines the scale of the objects that will be detected in the segmentation. Smaller values of $\beta$ will allow for smaller objects. See figure 11.
$\delta$ is the parameter linking the reconstruction and the segmentation. To better illustrate its role, let us consider a zero-filling like reconstruction and segmentation:
$\begin{align} \newcommand{\e}{{\rm e}} \newcommand{\im}{{\rm Im}} \displaystyle \imath_{\mathcal{SF} \cdot = f}(u)+ \delta \sum_i\in\Omega \sum_{j=1}^{\ell} v_{ij} (c_j - u_i)^2 {\rm d}x + \beta \|\nabla v\| \rightarrow \min_{u,v}, \label{eq:zerojoint} \nonumber \end{align} \tag{ 26 }$
where $\newcommand{\im}{{\rm Im}} \newcommand{\e}{{\rm e}} \imath(u) = \left\{\begin{array}{@{}lll@{}} +\infty , & {{\rm if}}\ \mathcal{SF}u \neq f \\ 0, & {{\rm if}}\ \mathcal{SF}u = f \end{array}\right.$ . This problem is solving the zero-filled reconstruction and segmentation jointly. For $\delta=0$ , the reconstruction is the zero-filling solution. In figure 12 we can see the impact of the segmentation term on the reconstruction for increasing values of $\delta$ . We can see that for very small $\delta$ the result is close to the zero-filling solution. For $\delta=1$ the noise from the model is present as expected but in addition the boundaries are enhanced. For large $\delta$ the boundaries are still very pronounced and the noise is also amplified.

**Figure 10.** The parameter $\alpha$ balances the data fidelity term and the total variation regularisation for the reconstruction. Smaller values of $\alpha$ produce a reconstruction closer to the data fitting term, hence less smooth as in (a). As $\alpha$ increases in (b) the solution gets smoother and less noisy. Finally for large values it tends to become more piecewise constant as in (c). (a) $\alpha = 0.001$ . (b) $\alpha = 0.01$ . (c) $\alpha = 0.1$ .
Download figure:
Standard image High-resolution image

**Figure 11.** The parameter $\beta$ determines the scale of the objects that we are segmenting. Smaller values of $\beta$ can detect smaller objects (a), which are lost for intermediate values (b). Finally very large values only detect main structures (c). (a) $\beta = 0.1$ . (b) $\beta = 1$ . (c) $\beta = 3$ .
Download figure:
Standard image High-resolution image

**Figure 11.** The parameter $\beta$ determines the scale of the objects that we are segmenting. Smaller values of $\beta$ can detect smaller objects (a), which are lost for intermediate values (b). Finally very large values only detect main structures (c). (a) $\beta = 0.1$ . (b) $\beta = 1$ . (c) $\beta = 3$ .
Download figure:
Standard image High-resolution image

**Figure 12.** We show the reconstructions obtained solving (26) for different values of $\delta$ . For $\delta=0$ we get the zero-filling solution. For small $\delta$ we expect the solution to be similar to the zero-filling reconstruction. For $\delta=1$ we see the effect of the joint term on the reconstruction. The solution presents the same noise artefacts but having in addition very sharp boundaries. Finally, for very large $\delta$ we still have enhanced boundaries but we also amplify the noise. (a) $\delta = 0$ . (b) $\delta = 0.1$ . (c) $\delta = 1$ . (d) $\delta = 2.5$ .
Download figure:
Standard image High-resolution image

**Figure 12.** We show the reconstructions obtained solving (26) for different values of $\delta$ . For $\delta=0$ we get the zero-filling solution. For small $\delta$ we expect the solution to be similar to the zero-filling reconstruction. For $\delta=1$ we see the effect of the joint term on the reconstruction. The solution presents the same noise artefacts but having in addition very sharp boundaries. Finally, for very large $\delta$ we still have enhanced boundaries but we also amplify the noise. (a) $\delta = 0$ . (b) $\delta = 0.1$ . (c) $\delta = 1$ . (d) $\delta = 2.5$ .
Download figure:
Standard image High-resolution image

4.4. Comparison with another joint approach

We present a comparison of our joint model with another non-convex method, namely the Potts model approach by [33], described in section 2.4. The major advantage of the joint reconstruction and segmentation using the Potts model is that it does not require to select explicitely the number of regions to segment, although this depends on the choice of the regularisation parameter. However, by definition, it only produces a piecewise constant image, therefore a segmentation, and not a reconstruction. This is useful in some applications where one is only interested in the segmentation. In contrast, our model produces both reconstruction and segmentation. In figure 13, we show the results for some examples. Note that because the results of the Potts model are in the range of the groundtruth image, while our segmentation are in label space, we can not directly use the RSE as before, or common metrics that compare actual intensities such as PSNR and structure similarity index measure (SSIM), for comparison. For example, for some tissue in class 1, to label it class 2 is as wrong as to label it class 3. However in this case, the SSIM and PSNR will favour the label class 2.

**Figure 13.** Comparison of our joint approach with the Potts model. Noise level and undersampling rate are described in figures 2, 3 and 8. The results are presented for three different examples and for two different choices of the regularisation parameter $\gamma$ . We can see that the Potts model tends to overestimate the number of regions to segment. (a) Groundtruth. (b) Joint segmentation. (c) Potts model, $\gamma=0.01$ . (d) Potts model, $\gamma=0.05$ . (e) Groundtruth. (f) Joint segmentation. (g) Potts model, $\gamma=0.75$ . (h) Potts model, $\gamma=0.5$ . (i) Segmentation from zero-filled reconstruction. (j) Joint segmentation. (k) Potts model, $\gamma=0.05$ . (l) Potts model, $\gamma=0.1$ .
Download figure:
Standard image High-resolution image

**Figure 13.** Comparison of our joint approach with the Potts model. Noise level and undersampling rate are described in figures 2, 3 and 8. The results are presented for three different examples and for two different choices of the regularisation parameter $\gamma$ . We can see that the Potts model tends to overestimate the number of regions to segment. (a) Groundtruth. (b) Joint segmentation. (c) Potts model, $\gamma=0.01$ . (d) Potts model, $\gamma=0.05$ . (e) Groundtruth. (f) Joint segmentation. (g) Potts model, $\gamma=0.75$ . (h) Potts model, $\gamma=0.5$ . (i) Segmentation from zero-filled reconstruction. (j) Joint segmentation. (k) Potts model, $\gamma=0.05$ . (l) Potts model, $\gamma=0.1$ .
Download figure:
Standard image High-resolution image

We therefore focus on a visual assessment and show the results of the Potts model for two different choices of the regularisation parameter $\gamma$ . We recall that the proposed model requires to determine the number of classes in advance, while the model for comparison estimates the number of regions but this depends on the choice of the regularisation parameter. In the top row, we can see that the Potts model, although it retrieves the shape of the main structures for the brain phantom example, it overestimates the number of classes. By increasing the parameter $\gamma$ , this issue is not resolved as it assigns different intensities to objects of the same category. In contrast, our approach is able to identify the desired classes as in the groundtruth. For the bubble case (middle row), we can see that our method works better and our segmentation is more accurate, while the Potts model fails to capture shape details (e.g. outer circle is distorted) and again overestimates the number of regions. We can also see that, when slightly decreasing $\gamma$ , the Potts model is very sensitive to artefacts. For the real MR data (bottom row), we see that both methods identify the tumour quite well. Because we were only interested in identifying three classes as tumour, brain and background, we do not segment the outer region (rat's head), captured insted by the Potts model. However, the Potts model only produces the segmentation, while our method, as shown in figure 8, also produces an enhanced reconstruction with sharp edges.

5. Conclusion

In this paper, we have investigated a novel mathametical approach to perform simultaneously reconstruction and segmentation from undersampled MRI data. Our motivation was to include in the reconstruction prior knowledge of the objects we are interested in. By interconnecting the reconstruction and the segmentation terms, we can achieve sharper reconstructions and more accurate segmentations. We derived a variational model based on Bregman iteration and we have verified its convergence properties. With our approach we show that by solving the more complicated joint model, we are able to improve both reconstruction and segmentation compared to the traditional sequential approach. This suggests that with the joint model it is possible to reduce error propagations that occur in sequential analysis, when the segmentation is separate and posterior to the reconstruction.

We have tested our method for two different application, which are bubbly flow and cancer imaging. In both cases, the reconstructions are sharper and finer structures are detected. Additionally, the segmentations also benefit from the improvement in the reconstructions. We have found that the joint model outperforms the sequential approach by exploiting prior information on the objects that we want to segment. In addition, we also show that our method performs better than the well-known Potts model. We also presented a discussion on the parameter choice rule that offer some insight on how to tune the parameters according to the desired result. It is interesting to notice that, with our model, we are able to control the segmentation effect on the reconstruction. Furthermore, when the final analysis of the MR image is indeed the segmentation, it is possible to bias the reconstruction towards the piecewise constant solution, yet preserving finer details in the structure.

In our set-up, we have specified the intensity constants characteristic of the region of interests, which were known a priori for our applications. However, it is possible to also include the optimisation with respect to c_j in our joint model, where the same convergence guarantees hold (see remark 2). Nevertheless, one limitation of the model is the need to specify the number of regions to be segmented.

In our future research, we would like to study the extension of this model for the bubbly flow to the reconstruction of the magnitude image as well as the phase image. The goal is not only to extract the structure of the bubble, but also to estimate velocity information, which is encoded in the phase image. As the problem is non-convex in the joint argument but also non-convex with respect to the phase, we need to derive a different convergence analysis.

Acknowledgments

VC acknowledges the financial support of the Cambridge Cancer Centre. MB acknowledges the Leverhulme Trust Early Career Fellowship 'Learning from mistakes: a supervised feedback-loop for imaging applications', the Isaac Newton Trust and the Cantab Capital Institute for the Mathematics of Information. MJE and CBS acknowledge support from Leverhulme Trust project 'Breaking the non-convexity barrier', EPSRC Grant No. 'EP/M00483X/1', EPSRC centre 'EP/N014588/1', the Cantab Capital Institute for the Mathematics of Information, and from CHiPS and NoMADS (Horizon 2020 RISE project grant). Moreover, CBS is thankful for support by the Alan Turing Institute. We would like to thank Kevin Brindle and Alan Wright for the acquisition of the data in figure 8.

Appendix. Numerical results on phantoms

**Figure A1.** This example shows clearly the effect of the parameter $\delta$ in the joint model. The segmentation is easy to achieve and we do not see a significant improvement in joint segmentation compared to the TV sequential segmentation, but there is a small gain compared to the sequential Bregman segmantation. However, the joint reconstruction results improved thanks to the parameter $\delta$ which biases the reconstruction to be closer to the segmentation. (a) Groundtruth. (b) TV reconstruction, $\alpha=0.15$ , RRE = 0.0305, PSNR = 27.44. (c) Bregman reconstruction, $\alpha=1.1$ , RRE = 0.0427, PSNR = 27.21. (d) Joint reconstruction, $\alpha=0.8$ , RRE = 0.0262, PSNR = 28.27. (e) Sampling matrix, 15%. (f) Segmentation, $\beta=0.001$ RSE = 0.0219. (g) Bregman segmentation, $\beta=0.001$ RSE = 0.0399. (h) Joint segmentation, $\beta=0.001$ , $\delta=2$ , RSE = 0.0219.
Download figure:
Standard image High-resolution image

**Figure A2.** In this example, we can see that the reconstructions are quite similar. However in the joint reconstruction, the outer yellow circle, which is completely ignored by the sequential reconstructions, is partially detected. This is also the case for the joint segmenation. (a) Groundtruth. (b) TV reconstruction, $\alpha=0.3$ , RRE = 0.0578, PSNR = 21.43. (c) Bregman reconstruction, $\alpha=1.5$ , RRE = 0.1307, PSNR = 21.49. (d) Joint reconstruction, $\alpha=1.5$ RRE = 0.0713, PSNR = 21.87. (e) Sampling matrix, 15%. (f) Segmentation, $\beta=0.001$ , RSE = 0.096. (g) Bregman segmentation, $\beta=0.001$ , RSE = 0.121. (h) Joint segmentation, $\beta=0.001$ , $\delta=0.1$ , RSE = 0.091.
Download figure:
Standard image High-resolution image

**Figure A2.** In this example, we can see that the reconstructions are quite similar. However in the joint reconstruction, the outer yellow circle, which is completely ignored by the sequential reconstructions, is partially detected. This is also the case for the joint segmenation. (a) Groundtruth. (b) TV reconstruction, $\alpha=0.3$ , RRE = 0.0578, PSNR = 21.43. (c) Bregman reconstruction, $\alpha=1.5$ , RRE = 0.1307, PSNR = 21.49. (d) Joint reconstruction, $\alpha=1.5$ RRE = 0.0713, PSNR = 21.87. (e) Sampling matrix, 15%. (f) Segmentation, $\beta=0.001$ , RSE = 0.096. (g) Bregman segmentation, $\beta=0.001$ , RSE = 0.121. (h) Joint segmentation, $\beta=0.001$ , $\delta=0.1$ , RSE = 0.091.
Download figure:
Standard image High-resolution image

**Figure A3.** In this example for the bubbly flow, we can see clearly an improvement for both joint reconstruction and joint segmentation. The contrast in the joint reconstruction is better recovered and the segmentation is more accurate, especially for the bubbles close to the edge of the pipe. The joint method results particularly useful for the bubbly flow application. (a) Groundtruth. (b) TV reconstruction, $\alpha=0.15$ , RRE = 0.074, PSNR = 17.15. (c) Bregman reconstruction, $\alpha=2$ , RRE = 0.071, PSNR = 17.65. (d) Joint reconstruction, $\alpha=0.8$ , RRE = 0.047, PSNR = 19.015. (e) Sampling matrix, 8%. (f) Segmentation, $\beta=0.01$ , RSE = 0.016. (g) Bregman segmentation, $\beta=0.01$ , RSE = 0.022. (h) Joint segmentation, $\beta=0.01$ , $\delta=1$ , RSE = 0.014.
Download figure:
Standard image High-resolution image

**Figure A3.** In this example for the bubbly flow, we can see clearly an improvement for both joint reconstruction and joint segmentation. The contrast in the joint reconstruction is better recovered and the segmentation is more accurate, especially for the bubbles close to the edge of the pipe. The joint method results particularly useful for the bubbly flow application. (a) Groundtruth. (b) TV reconstruction, $\alpha=0.15$ , RRE = 0.074, PSNR = 17.15. (c) Bregman reconstruction, $\alpha=2$ , RRE = 0.071, PSNR = 17.65. (d) Joint reconstruction, $\alpha=0.8$ , RRE = 0.047, PSNR = 19.015. (e) Sampling matrix, 8%. (f) Segmentation, $\beta=0.01$ , RSE = 0.016. (g) Bregman segmentation, $\beta=0.01$ , RSE = 0.022. (h) Joint segmentation, $\beta=0.01$ , $\delta=1$ , RSE = 0.014.
Download figure:
Standard image High-resolution image

Enhancing joint reconstruction and segmentation with non-convex Bregman iteration

Article metrics

Submit

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. MRI reconstruction and segmentation

2.1. Reconstruction

2.2. Segmentation

2.3. Joint reconstruction and segmentation

2.4. Comparison to other joint reconstruction and segmentation approaches

2.4.1. CT/SPECT.

2.4.2. PET and transmission tomography.

2.4.3. MRI.

2.4.4. Summary.

3. Optimisation

3.1. Convergence analysis

3.2. Proof of theorem 1

4. Numerical results

4.1. Bubbly flow

4.2. Cancer imaging

4.3. Parameter choice rule

4.4. Comparison with another joint approach

5. Conclusion

Acknowledgments

Appendix. Numerical results on phantoms

Enhancing joint reconstruction and segmentation with non-convex Bregman iteration

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. MRI reconstruction and segmentation

2.1. Reconstruction

2.2. Segmentation

2.3. Joint reconstruction and segmentation

2.4. Comparison to other joint reconstruction and segmentation approaches

2.4.1. CT/SPECT.

2.4.2. PET and transmission tomography.

2.4.3. MRI.

2.4.4. Summary.

3. Optimisation

3.1. Convergence analysis

3.2. Proof of theorem 1

4. Numerical results

4.1. Bubbly flow

4.2. Cancer imaging

4.3. Parameter choice rule

4.4. Comparison with another joint approach

5. Conclusion

Acknowledgments

Appendix. Numerical results on phantoms