Sie können Operatoren mit Ihrer Suchanfrage kombinieren, um diese noch präziser einzugrenzen. Klicken Sie auf den Suchoperator, um eine Erklärung seiner Funktionsweise anzuzeigen.
Findet Dokumente, in denen beide Begriffe in beliebiger Reihenfolge innerhalb von maximal n Worten zueinander stehen. Empfehlung: Wählen Sie zwischen 15 und 30 als maximale Wortanzahl (z.B. NEAR(hybrid, antrieb, 20)).
Findet Dokumente, in denen der Begriff in Wortvarianten vorkommt, wobei diese VOR, HINTER oder VOR und HINTER dem Suchbegriff anschließen können (z.B., leichtbau*, *leichtbau, *leichtbau*).
Der Artikel stellt den Vorwärts-Rückwärts-Vorwärts-Algorithmus mit Extrapolation aus der Vergangenheit zur Lösung monotoner Inklusionsprobleme vor, die für Optimierung und numerische Analyse von zentraler Bedeutung sind. Der Algorithmus wurde entwickelt, um die Effizienz durch Reduzierung der Rechenkosten und des Energieverbrauchs zu verbessern. Die Studie erweitert frühere Sanktionssysteme und -algorithmen und liefert unter bestimmten Bedingungen starke Konvergenzergebnisse. Die Autoren demonstrieren auch die praktische Anwendung des Algorithmus in der TV-basierten Bildbearbeitung und zeigen seine Effektivität und Rechenvorteile gegenüber klassischen Methoden auf. Die Arbeit ist besonders relevant für Forscher und Praktiker in den Bereichen Optimierung, numerische Analyse und verwandte Bereiche.
KI-Generiert
Diese Zusammenfassung des Fachinhalts wurde mit Hilfe von KI generiert.
Abstract
In this paper, we consider an improved iterative method for solving the monotone inclusion problem in the form of \(0 \in A(x) + D(x) + N_{C}(x)\) in a real Hilbert space, where A is a maximally monotone operator, D and B are monotone and Lipschitz continuous, and C is the nonempty set of zeros of the operator B. We investigate the weak ergodic and strong convergence (when A is strongly monotone) of the iterates produced by our considered method. We show that the algorithmic scheme can also be applied to minimax problems. Furthermore, we discuss how to apply the method to the inclusion problem involving a finite sum of compositions of linear continuous operators by using the product space approach and employ it for convex minimization. Finally, we present a numerical experiment in TV-based image inpainting to validate the proposed theoretical theorem.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
In the last decade, penalty schemes have become a popular approach for studying constrained or hierarchical optimization problems in Hilbert spaces, particularly for solving complex problems (see [1‐7]). In this work, we intend to study the general monotone inclusion problem in the term of the sum of three operators as the following form:
where \(A: \mathcal {H} \rightrightarrows \mathcal {H}\) is a maximally monotone operator, \(D:\mathcal {H}\rightarrow \mathcal {H}\) a (single-valued) monotone and \(\eta ^{-1}\)-Lipschitz continuous operator with \(\eta >0\), and \(N_C: \mathcal {H} \rightrightarrows \mathcal {H}\) is the normal cone of the closed convex set \(C\subseteq \mathcal {H}\) which is the nonempty set of zeros of another operator \(B : \mathcal {H} \rightarrow \mathcal {H}\), which is a (single-valued) \(\mu ^{-1}\)-Lipschitz continuous operator with \(\mu >0\). If the involved operators are subdifferentials, the situation can be framed as a specialized convex bilevel optimization problem. In this context, the feasible set of the upper-level problem corresponds to the set of solutions for the lower-level problem, and both sets are characterized by convexity (see (3) and (4) below).
If \(D=0\), the problem (1) is the generalized version of the monotone inclusion problem
where \(C=\arg \min \Psi \ne \emptyset \), and \(\Psi : \mathcal {H} \rightarrow \mathbb {R}\) is a convex differentiable function with a Lipschitz gradient satisfying \(\min \Psi = 0\), which introduces a penalization function with respect to the constraint \(x\in C\). Implicit and explicit discretized methods to solve the problem in (2) were proposed in Attouch et al.’s work [1, 2]. Several subsequent penalty type numerical algorithms in the literature are inspired by the continuous nonautonomous differential inclusion investigated in Attouch and Czarnecki’s paper [8].
Anzeige
In the case where A and D are the convex subdifferentials of the proper, convex, and lower semicontinuous functions \(\Phi : \mathcal {H} \rightarrow \bar{\mathbb {R}}\) and \(\phi : \mathcal {H} \rightarrow \bar{\mathbb {R}}\), respectively, then any solution of (1) solves the convex minimization problem
respectively. Consequently, a solution to these convex minimizations can be approximated by using the same iterative scheme as in [1, 2, 5]. Furthermore, there are other methods for solving such optimization problems which are studied by several authors in the literature (see, for instance, [3, 6, 7, 9]).
The iterative algorithms for solving both monotone inclusion problems (1) and (2) in the penalty scheme have been proposed by several researchers (see [1‐5]). They could provide the weak ergodic convergence of the iterates for their proposed algorithms. In particular, they must suppose the strong monotonicity of A to prove the strong convergence. However, the (ergodic) convergent results need to assume the following hypothesis:
For solving the problem (2) in case \(C=\arg \min \Psi \ne \emptyset \) with the algorithm proposed by Attouch et al. [1, 2], the Fenchel conjugate function of \(\Psi \) (namely, \(\Psi ^{*}\)) needs to fulfill some key hypotheses in this context, as shown below: \((H){\left\{ \begin{array}{ll} (i) &{}A+N_C \;\text {is maximally monotone and} \; zer(A+N_C)\ne \emptyset ;\\ (ii) &{}\text {For every } p\in {\text {ran}} N_C, \sum _{n\in \mathbb {N} } \lambda _n\beta _n \left[ \Psi ^* \left( \frac{p}{\beta _n}\right) -\sigma _C \left( \frac{p}{\beta _n}\right) \right] < +\infty ;\\ (iii) &{}(\lambda _n)_{n\in \mathbb {N} } \in \ell ^2 \setminus \ell ^1, \end{array}\right. }\) where \((\lambda _n)_{n\in \mathbb {N}}\) and \((\beta _n)_{n\in \mathbb {N}}\) are sequences of positive real numbers. Note that the hypothesis (ii) of (H) is satisfied when \(\sum _{n\in \mathbb {N}} \frac{\lambda _n}{\beta _n} < +\infty \) and \(\Psi \) is bounded below by a multiple of the square of distances to C, as described in [2]. One such case is when \(C=zer L\), where \(L: \mathcal {H}\rightarrow \mathcal {H}\) is a linear continuous operator with closed range and \(\Psi : \mathcal {H}\rightarrow \mathbb {R}\) is defined as \(\Psi (x) = \frac{1}{2}\Vert Lx \Vert ^2 \), see further Section 4.1 in [1].
For solving the problem (1) with a forward-backward type or Tseng’s type (or forward-backward-forward) algorithm proposed by Boț et al. in [3‐5], the required hypotheses involve the Fitzpatrick function associated to the maximally monotone operator B and reads as: \((H_{fitz}){\left\{ \begin{array}{ll} (i) &{}A+N_C \;\text {is maximally monotone and} \; zer(A+D+N_C)\ne \emptyset ;\\ (ii) &{}\text {For every } p\in {\text {ran}} N_C, \sum _{n\in \mathbb {N}} \lambda _n\beta _n \left[ \sup \limits _{u\in C} \varphi _{B} \left( u,\frac{p}{\beta _n}\right) -\sigma _C \left( \frac{p}{\beta _n}\right) \right] < +\infty ;\\ (iii) &{}(\lambda _n)_{ n\in \mathbb {N}} \in \ell ^2 \setminus \ell ^1, \end{array}\right. }\) where \((\lambda _n)_{n\in \mathbb {N}}\) and \((\beta _n)_{n\in \mathbb {N}}\) are sequences of positive real numbers. Note that when \(B= \partial \Psi \) and \(\Psi (x)=0\) for all \(x\in C\), then by (5) one can see that condition (ii) of (H) implies the second condition of \((H_{fitz})\), see [4, 5].
For solving the monotone inclusion problem (1), if we use the approach involved with the forward-backward algorithm, then some of the operators in the iterates are cocoercive. For instance, (see [5])
with the sequences of positive real number, \((\lambda _n)_{n\in \mathbb {N}}\) and \((\beta _n)_{n\in \mathbb {N}}\), and the operators D and B are cocoercive. We know from the definition of the cocoercivity (see Section 2) that every cocoercive operator is a monotone and Lipchitz operator, but the converse argument is not the case. One of the research directions is to develop an algorithm for inclusions involving monotone and Lipschitz operators, which has known as Tseng’s type algorithm (or forward-backward-forward algorithm), see further [10]. In the penalty scheme, Boț and Csetnek [5] relaxed the cocoercivity of B and D to monotonicity and Lipschitzian and the convergence properties of Boț and Csetnek’s algorithm are studied under \((H_{fitz})\) and the condition \(\lim \sup _{n \rightarrow + \infty } \left( \frac{\lambda _{n}\beta _n}{\mu } + \frac{\lambda _{n}}{\eta } \right) < 1\).
In recent years, Tseng’s forward-backward-forward algorithm has been modified by many researchers in various versions depending on the setting of the operators (see [11‐15], and references therein). One such modification is the Tseng’s forward-backward-forward algorithm with extrapolation from the past, which is developed by using Popov’s idea [16] for the extragradient method with the extrapolated technique. This algorithm can store and reuse the extrapolated term in the next step of the iterative scheme, illustrated as
where \(\gamma \) satisfies some suitable condition for each method and \(J_A\) is the resolvent operator of A. Indeed, the classical Tseng’s method requires two forward evaluations in every iteration, whereas the modified algorithm with extrapolation from the past can reuse one of the calculations from the prior procedure. According to this scheme, it has the potential to reduce computational costs and energy consumption in practical applications.
Motivated by these considerations, we investigate Tseng’s forward-backward-forward algorithm with extrapolation from the past involving a penalty scheme under certain hypotheses for solving the generalized inclusion problem in terms of the sum of three operators (1). We propose the algorithm and prove its weak ergodic convergence. Moreover, strong convergence is also demonstrated when the operator A is a strongly monotone operator in Section 3. Furthermore, we can extend our results in Section 3 to solve minimax problems as elaborated in Section 4. In Section 5, our proposed algorithm can be extended to solving more intricate problems involving the finite sum of the composition of the linear continuous operator with maximally monotone operators by using the product space approach, and the convergence results for our modified iterative method are also provided.
Anzeige
To broaden the applicability of our scheme, we also provide the iterative scheme and its convergence result for the convex minimization problem expressed in Section 6. Finally, we demonstrate a numerical experiment in TV-based image inpainting to ensure theoretical convergence results in Section 7.
2 Preliminaries
In this paper, we introduce some notations used throughout. We denote the set of positive integers by \(\mathbb {N}\) and a real Hilbert space with an inner product\(\langle \cdot , \cdot \rangle \) and the associated norm\(\Vert \cdot \Vert = \sqrt{\langle \cdot , \cdot \rangle }\) by \(\mathcal {H}\). A sequence \((x_n)_{n\in \mathbb {N}}\) is said to converge weakly to x, if for any \(y\in \mathcal {H}\), \(\langle x_n, y \rangle \rightarrow \langle x, y \rangle \), and we use the symbols \(\rightharpoonup \) and \(\rightarrow \) to represent weak and strong convergence, respectively. For a linear continuous operator \(L: \mathcal {H}\rightarrow \mathcal {G}\), where \(\mathcal {G}\) is another real Hilbert space, we define the norm of L as \(\Vert L \Vert = \sup \{ \Vert Lx\Vert : x\in \mathcal {H}, \Vert x\Vert \le 1 \}\). We also use \(L^* : \mathcal {G} \rightarrow \mathcal {H}\) to denote the adjoint operator of L, defined by \(\langle L^* y, x \rangle = \langle y, Lx \rangle \) for all \((x,y) \in \mathcal {H} \times \mathcal {G}\).
For a function \(f: \mathcal {H} \rightarrow \bar{\mathbb {R}}\) (where \( \bar{\mathbb {R}}:= \mathbb {R}\cup \{\pm \infty \}\) is the extended real line), we denote its effective domain by \({\text {dom}} f = \{ x \in \mathcal {H} : f(x) < + \infty \}\) and say that f is proper if \({\text {dom}} f\ne \emptyset \) and \(f(x) \ne - \infty \) for all \(x\in \mathcal {H}\). Let \(f^* : \mathcal {H} \rightarrow \bar{\mathbb {R}}\), where \(f^*(u) = \sup _{x\in \mathcal {H}} \{ \langle u, x \rangle - f(x) \}\) for all \(u\in \mathcal {H}\), be the conjugate function of f. We denote by \(\Gamma (\mathcal {H})\) the family of proper, convex, and lower semicontinuous extended real-valued functions defined on \(\mathcal {H}\). The subdifferential of f at \(x\in \mathcal {H}\), with \(f(x)\in \mathbb {R}\), is the set \(\partial f(x) := \{v \in \mathcal {H} : f(y) \ge f(x) + \langle v, y-x \rangle \; \forall y\in \mathcal {H} \}\). We take, by convention, \(\partial f(x) :=\emptyset \) if \(f(x) \in \{ \pm \infty \}\).
The indicator function of a nonempty set \(S\subseteq \mathcal {H}\), denoted by \(\delta _{S} : \mathcal {H}\rightarrow \bar{\mathbb {R}}\), is defined as follows: \(\delta _{S}(x)=0\) if \(x\in S\) and \(\delta _{S}(x)=+\infty \) otherwise. The subdifferential of the indicator function is called the normal cone of S, denoted by \(N_{S}(x)\). For \(x \in S\), \(N_{S}(x)=\{ u\in \mathcal {H} : \langle y-x, u \rangle \le 0 \; \forall y \in S \}\), and \(N_{S}(x) = \emptyset \) if \(x\notin S\). It is worth noting that for \(x\in S\), \(u \in N_{S} (x)\) if and only if \(\sigma _{S}(u) = \langle x, u \rangle \), where \(\sigma _{S}\) is the support function of S. The support function is defined as \(\sigma _{S} (u) = \sup _{y\in S} \langle y, u \rangle \). A cone is a set that is closed under positive scaling, i.e., \(\lambda x \in S\) for all \(x \in S\) and \(\lambda > 0\). The normal cone, which is a tool used in optimization, is always a cone itself.
Let A be a set-valued operator mapping from a Hilbert space \(\mathcal {H}\) to itself. We define the graph of A as \(GrA=\{(x,u)\in \mathcal {H}\times \mathcal {H}:u\in A(x) \}\), the domain of A as \({\text {dom}} A=\{x\in \mathcal {H}: A(x) \ne \emptyset \}\), and the range of A as \({\text {ran}} A=\{u\in \mathcal {H}:\exists x\in \mathcal {H} \;\text {s.t.} \; u\in A(x) \}\). The inverse operator of A, denoted by \(A^{-1}:\mathcal {H}\rightrightarrows \mathcal {H}\), is defined as \((u,x)\in Gr A^{-1}\) if and only if \((x,u)\in Gr A\). We also define the set of zeros of A as \(zerA=\{x\in \mathcal {H}:0\in A(x) \}\). The operator A is said to be monotone if \(\langle x-y, u-v \rangle \ge 0\) for all \((x,u),(y,v)\in Gr A\). Moreover, a monotone operator A is said to be maximally monotone if its graph on \(\mathcal {H}\times \mathcal {H}\) has no proper monotone extension. If A is maximally monotone, then zerA is a closed and convex set, and a point \(z\in \mathcal {H}\) belongs to zerA if and only if \(\langle u-z, w \rangle \ge 0\) for all \((u,w)\in Gr A\). Conditions ensuring that zerA is nonempty can be found in Section 23.4 of [17].
The set-valued operator A is said to be \(\gamma \)-strongly monotone with \(\gamma > 0\) if and only if \(\langle x-y, u-v \rangle \ge \gamma \Vert x-y\Vert ^2\) for all \((x,u),(y,v) \in Gr A\). If A is maximally monotone and strongly monotone, then the set of zeros zerA is a singleton and thus nonempty (refer to Corollary 23.37 in [17]). A single-valued operator \(A : \mathcal {H}\rightarrow \mathcal {H}\) is considered to be \(\gamma \)-cocoercive if \(\langle x-y, A(x)-A(y) \rangle \ge \gamma \Vert A(x)-A(y) \Vert ^2\), and \(\gamma \)-Lipschitz continuous (or \(\gamma \)-Lipschitzian) if \(\Vert A(x)-A(y) \Vert \le \gamma \Vert x-y\Vert \) for all \((x,y)\in \mathcal {H}\times \mathcal {H}\).
In optimization, the proximal operator is a common tool used to develope algorithms for solving problems that involve a sum of a smooth function and a nonsmooth function. The proximal operator of a function f at a point x is defined as the unique minimizer of the function \(y\mapsto f(y) + \frac{1}{2}\Vert y-x\Vert ^2\), denoted by \(\text {prox}_{ f}(x):\mathcal {H}\rightarrow \mathcal {H}\). The resolvent of A, denoted by \(J_A : \mathcal {H} \rightrightarrows \mathcal {H}\), is defined as \(J_{A}=(Id+A)^{-1}\) where \(Id: \mathcal {H} \rightarrow \mathcal {H}\), \(Id(x) = x\) for all \(x\in \mathcal {H}\), is the identity operator on \(\mathcal {H}\). Moreover, if A is maximally monotone, then \(J_A: \mathcal {H}\rightarrow \mathcal {H}\) is single-valued and maximally monotone (see Corollary 23.10 in [17]). Notice that \(J_{ \partial f} = (Id + \partial f)^{-1}=\text {prox}_{f}\) (see also Proposition 16.34 in [17]).
The notation we use in this paper was introduced by Fitzpatrick in [18] and has proved to be a useful tool for investigating the maximality of monotone operators using convex analysis techniques (see [17, 19]). For a monotone operator A, we define its associated Fitzpatrick function\(\varphi _{A}: \mathcal {H}\times \mathcal {H}\rightarrow \bar{\mathbb {R}}\) as follows:
This function is convex and lower semicontinuous, and it will be an important tool for investigating convergence in this paper.
If A is a maximally monotone operator, then its Fitzpatrick function is proper and satisfies \(\varphi _{A}(x,u) \ge \langle x,u\rangle \) for all \((x,u) \in \mathcal {H}\times \mathcal {H}\), with equality if and only if \((x,u)\in Gr A\). In particular, the subdifferential \(\partial f\) of any convex function \(f\in \Gamma (\mathcal {H})\) is a maximally monotone operator (see [20]), and we have \((\partial f)^{-1} = \partial f^*\). Furthermore, we have the inequality (see [19])
Before we enter on a detailed analysis of the convergence result, we introduce some definitions of sequences and useful lemmas that will be used several times in the paper. Let \((x_n)_{n\in \mathbb {N}\cup \{0\}}\) be a sequence in \(\mathcal {H}\) and \((\lambda _{k})_{k\in \mathbb {N}\cup \{0\}}\) be a sequence of positive numbers such that \(\sum _{k\in \mathbb {N}\cup \{0\}} \lambda _k = +\infty \). Let \((z_n)_{n\in \mathbb {N}\cup \{0\}}\) be the sequence of weighted averages defined as shown in [2, 4, 5]:
The following lemma is the Opial-Passty Lemma, which serves as a key tool for achieving the convergence results in our analysis.
Lemma 1
(Opial-Passty) Let T be a nonempty subset of a real Hilbert space \(\mathcal {H}\), and assume that \(\lim _{n\rightarrow \infty } \Vert x_n -x \Vert \) exists for every \(x\in T\). If every weak cluster point of \((x_n)_{n\in \mathbb {N}}\) (respectively \((z_n)_{n\in \mathbb {N}}\)) belongs to T, then \((x_n)_{n\in \mathbb {N}}\) (respectively \((z_n)_{n\in \mathbb {N}}\)) converges weakly to an element in T as \(n\rightarrow \infty \).
Another key lemma underlying our work is presented below, which establishes the existence of convergence and the summability of sequences satisfying the conditions of the lemma. This lemma has been sourced from [1].
Lemma 2
Let \((a_n)_{n\in \mathbb {N}},(b_n)_{n\in \mathbb {N}}\) and \((\epsilon _{n})_{n\in \mathbb {N}}\) be real sequences. Assume that \((a_n)_{n\in \mathbb {N}}\) is bounded from below, \((b_n)_{n\in \mathbb {N}}\) is a nonnegative sequence, \((\epsilon _n)_{n\in \mathbb {N}} \in \ell ^{1}\) such that
Then \( (a_n)_{n\in \mathbb {N}} \) is convergent and \((b_n)_{n\in \mathbb {N}} \in \ell ^{1}\).
3 Forward-backward-forward with extrapolation and penalty schemes
In this section, we will begin by presenting a problem introduced in [4, 5], which can be formulated as follows:
Problem 1: Let \(\mathcal {H}\) be a real Hilbert space, \(A : \mathcal {H}\rightrightarrows \mathcal {H}\) be a maximally monotone operator, \(D : \mathcal {H}\rightarrow \mathcal {H}\) be a monotone and \(\eta ^{-1}\)-Lipschitz continuous operator with \(\eta >0\), \(B:\mathcal {H}\rightarrow \mathcal {H}\) be a monotone and \(\mu ^{-1}\)-Lipschitz continuous operator with \(\mu >0\), and suppose that \(C = zerB\ne \emptyset \). The monotone inclusion problem that we want to solve is
Some methods to solve Problem 1 have been already studied in [4, 5] by using Tseng’s algorithm and penalty scheme. The iterative scheme presented below for solving Problem 1 draws inspiration from Tseng’s forward-backward-forward algorithm with extrapolation from the past (see [11, 12, 15]).
Of course, in order to demonstrate the (ergodic) convergence results we need to assume the hypotheses \((H_{fitz})\) for every \(n\in \mathbb {N}\cup \{ 0\}.\)
Remark 1
1.
When the operator B satisfies \( B(x) =0\) for all \(x\in \mathcal {H}\) (which implies \(N_C(x)={0}\) for every \(x\in \mathcal {H}\)), the algorithm outlined in Algorithm 1 reduces to the error-free scenario with the identity variable matrices of the forward-backward-forward with extrapolation method, as introduced in [15].
2.
Actually, we can choose \(y_{-1}\) as any starting point in \(\mathcal {H}\). This starting point affects the algorithm’s performance, but we can not specify which vector is the best choice for the problem. This freedom of choice has the pros and cons to consider. As in Section 7, we will compare our proposed method with the classical Tseng’s algorithm in the penalty scheme, which requires only one starting point \(x_0\). To be fair, the starting point \(y_{-1}\) is selected the same as \(x_0\).
Prior to establishing the weak ergodic convergence result, we shall prove the following lemma, which serves as a valuable tool for our main outcome.
Lemma 3
Let \((x_n)_{n\in \mathbb {N}\cup \{0\} }\) and \((y_n)_{ { n\in \mathbb {N}\cup \{0,-1\} } }\) be the sequences generated by Algorithm 1 and let \((u,w)\in Gr(A+D+N_C)\) be such that \(w=v+p+D(u)\), where \(v\in A(u)\) and \(p\in N_C(u)\). Then the following inequlity holds for \(n\in \mathbb {N}\cup \{0\}\):
From Algorithm 1 and the definition of the resolvent, we can derive that we have \(\frac{x_n-y_n}{\lambda _n} -\beta _n B(y_{n-1})- D(y_{n-1})\in A(y_n)\) for ever \(n\in \mathbb {N}\). Since \(v\in A(u)\), then we can guarantee by using the monotonicity of A that
For any \(a_1,a_2,a_3 \in \mathcal {H}\), we know that \(\frac{1}{2}\Vert a_1-a_2\Vert ^2 - \frac{1}{2}\Vert a_3-a_1 \Vert ^2 +\frac{1}{2}\Vert a_3-a_2\Vert ^2 = \langle a_1-a_2, a_3-a_2 \rangle \). Then, it follows from (11), we have that
By Parallelogram law, we know that \(2\Vert x_n-y_n\Vert ^2+2\Vert x_n-y_{n-1}\Vert ^2 = \Vert y_{n-1}-y_n\Vert ^2+\Vert (x_n-y_n)+(x_n-y_{n-1})\Vert ^2\). Then we have
Subsequently, we state the convergence of Algorithm 1 below.
Theorem 4
Let \((x_n)_{n\in \mathbb {N}\cup \{0\}}\) and \((y_n)_{n\in \mathbb {N}\cup \{0,-1\}}\) be the sequences generated by Algorithm 1 and \((z_n)_{n\in \mathbb {N}\cup \{0\}}\) be the sequence defined in (6). If (\(H_{fitz}\)) is fulfilled and \(\limsup _{n\rightarrow +\infty } \left( \frac{\lambda _n\beta _n}{ \mu }+\frac{\lambda _n}{ \eta } \right) < \frac{1}{2}\), then \((z_n)_{n\in \mathbb {N}\cup \{0\}}\) converges weakly to an element in \(zer(A+D+N_C)\) as \(n\rightarrow +\infty \).
Proof
The proof of the theorem is divided into three parts as the following statements:
(a)
\(\sum _{n\in \mathbb {N}\cup \{0\}} \Vert y_{n} - y_{n-1} \Vert ^2 < +\infty \) and the sequence \((\Vert x_n-u\Vert )_{n\in \mathbb {N}\cup \{0\}}\) is convergent for every \(u\in zer(A+D+N_C)\);
(b)
every weak cluster point of \((z'_n)_{n\in \mathbb {N}\cup \{ 0 \}}\) lies in \(zer(A+D+N_C)\), where
By the hypothesis, we have that \(\liminf _{n\rightarrow +\infty } \left( \frac{1}{2}- 2M_n^2 \right) >0\), it follows from Lemma 2 that the sequence \(E_n:= (\Vert x_n-u\Vert ^2 + M_{n-1}^2 \Vert y_{n-2}-y_{n-1}\Vert ^2)_{n \in \mathbb {N}} \) converges and \(\sum _{n\in \mathbb {N}} \Vert y_{n-1}-y_n\Vert ^2 < +\infty \). This means that \((\Vert y_{n-1}-y_n\Vert ^2)_{n\in \mathbb {N}\cup \{0\}} \rightarrow 0\) and so \((\Vert x_n-u\Vert )_{n\in \mathbb {N}\cup \{0\}}\) is a convergent sequence.
(b) Let z be a weak cluster point of \((z'_n)_{n\in \mathbb {N}}\). Take \((u,w)\in Gr(A+D+N_C)\) such that \(w=v+p+ D(u)\), where \(v\in A(u)\) and \(p\in N_C(u)\). Let \(K\in \mathbb {N}\) with \(\forall n_0\in \mathbb {N}\), \(K\ge n_0+2\). Summing up for \(n=n_0+1, \dots ,K\) the inequalities in (9) (it is nothing else than (15) with the additional term \(2\lambda _n\langle u-y_n, w \rangle \)) with \(\limsup _{n\rightarrow +\infty } \left( \frac{\lambda _n\beta _n}{ \mu }+\frac{\lambda _n}{ \mu } \right) < \frac{1}{2}\), for \(n\in \mathbb {N}\) and \(M_n=\frac{\lambda _n\beta _n}{\mu }+\frac{\lambda _n}{\eta }\) we get that
Discarding the nonnegative term \(\Vert x_{K+1}-u\Vert ^2\) and dividing \(2\tau _{K} = 2\sum _{n=1}^{K}\lambda _n\) we obtain
$$\begin{aligned} -\frac{\Vert x_{n_0+1}-u\Vert ^2}{2\tau _{K}}&\le \frac{R+2\langle -\sum _{n=1}^{n_0} \lambda _n u + \sum _{n=1}^{n_0} \lambda _n y_n, w \rangle }{2\tau _{K}} +\frac{2 \langle \sum _{n=1}^{K} \lambda _n u - \sum _{n=1}^{K} \lambda _n y_n, w \rangle }{2 \sum _{n=1}^{K}\lambda _n} \\&=\frac{\tilde{R}}{2\tau _K} + \langle u-z'_K, w \rangle , \end{aligned}$$
where \(\tilde{R} := R + 2 \langle -\sum _{n=1}^{n_0} \lambda _n u +\sum _{n=1}^{n_0} \lambda _n x, w \rangle \in \mathbb {R}\). By passing to the limit as \(K\rightarrow +\infty \) and using that \(\lim _{K\rightarrow +\infty } \tau _{K} = +\infty \), we get
Since z is a weak cluster point of \((z'_n)_{n\in \mathbb {N}\cup \{0\}}\), we obtain that \(\langle u-z, w \rangle \ge 0\). Finally as this inequality holds for arbitrary \((u,w)\in Gr(A+D+N_C)\), z lies in \(zer(A+D+N_C)\) (since \(A+D+N_C\) is maximally monotone, see preliminaries).
(c) Let \(n\in \mathbb {N}\cup \{0\}\). It is enough to prove that \(\lim _{n\rightarrow +\infty } \Vert z_n-z'_n\Vert =0\) and the statement of the theorem will be consequence of Lemma 1. Taking \(u\in zer(A+D+N_C)\) and \(w=0=v+p+D(u)\) where \(v\in A(u)\) and \(p\in N_C(u)\), it follows from the proof of (a) and the proof of Lemma 3 (see (13)) that \(\sum _{n\in \mathbb {N}\cup \{0\}} \Vert y_{n-1}-y_n\Vert ^2 < +\infty \) and \( \Vert x_{n+1} - y_n \Vert \le \left( \frac{\lambda _n\beta _n}{\mu }+\frac{\lambda _n}{\eta } \right) \Vert y_{n-1}-y_n\Vert \), respectively. For any \(n\in \mathbb {N}\cup \{0\}\) it holds
As observed in the forward-backward-forward penalty scheme discussed in [4, 5], strong monotonicity of the operator A guarantees the strong convergence of the sequence \((x_n)_{n\in \mathbb {N}\cup \{0\}}\). However, it remains to be seen whether this result extends to the forward-backward-forward with extrapolation from the past and the penalty scheme under consideration. The following result provides the necessary clarification.
Theorem 5
Let \((x_n)_{n\in \mathbb {N}\cup \{0\}}\) and \((y_n)_{n\in \mathbb {N} \cup \{0,-1\}}\) be the sequences generated by Algorithm 1. If \((H_{fitz})\) is fulfilled, \(\limsup _{n\rightarrow +\infty } \left( \frac{\lambda _n \beta _n}{\mu } + \frac{\lambda _n}{\eta }\right) < \frac{1}{2} \) and the operator A is \(\gamma \)-strongly monotone with \(\gamma > 0\), then \((x_n)_{n\in \mathbb {N}\cup \{0\}}\) converges strongly to the unique element in \(zer (A+D+N_{C})\) as \(n\rightarrow +\infty \).
Proof
Let u be the unique element of \(zer (A+D+N_C)\) and \(w = 0 = v + p + D(u)\), where \(v\in A(u)\) and \(p\in N_C(u)\). We can follow the proof of Lemma 3 under A is \(\gamma \)-strongly monotone with \(\gamma > 0\). This means that we obtain \(\frac{x_n-y_n}{\lambda _n} - D(y_{n-1}) - \beta _n B(y_{n-1}) \in A(y_n)\) , and one simply shows that for each \(n\in \mathbb {N}\)
The hypothesis \(\left( \limsup \limits _{n\rightarrow +\infty } \left( \frac{\lambda _n \beta _n}{\mu } + \frac{\lambda _n}{\eta } \right) < \frac{1}{2} \right) \) implies the existence of \(n_0 \in \mathbb {N}\) such that for every \(n\ge n_0\)
Since we have already known from Theorem 4 (a) that \(\sum _{n\in \mathbb {N}} \Vert y_{n} - y_{n-1} \Vert ^2 < +\infty \) and \(\left( \frac{\lambda _{n-1}\beta _{n-1}}{\mu } + \frac{\lambda _{n-1}}{\eta } \right) ^2\) is bounded from above by the hypothesis, then
Because \(\sum _{ { n\in \mathbb {N} } }\lambda _n=+\infty \), and \(\lim \limits _{n\rightarrow + \infty } \Vert x_n - u\Vert \) exists (proof of Theorem 4 (a)), we can conclude that \(\lim \limits _{n\rightarrow + \infty } \Vert x_n - u\Vert = 0 \). \(\square \)
4 Minimax problems
The minimax optimization problem is a classical problem where the goal is to find a saddle point for a function in two variables. This problem arises in many applications, including game theory [21], control theory [22], and robust optimization [23, 24]. In recent years, there has been a growing interest in the study of minimax problems in modern machine learning, such as Generative Adversarial Networks(GANs) [11], data-driven optimization [25] and furthermore in [26].
In this section, we will consider a class of minimax problems with a convex-concave structure, which can be solved using Tseng’s forward-backward-forward method with extrapolation from the past and penalty scheme. We will present the problem formulation, discuss the properties of the objective function, and introduce some numerical methods for solving the problem.
Let \(f:\mathcal {H}\times \mathcal {G} \rightarrow \mathbb {R}\) be convex-concave and differentiable, where \(f(\cdot , y)\) is convex for all \(y\in \mathcal {G}\) and \(f(x, \cdot )\) is concave for all \(x\in \mathcal {H}\). The Min-Max problem (or minimax problem) is the problem in the form:
Let \(F:\mathcal {H}\times \mathcal {G} \rightarrow \mathcal {H}\times \mathcal {G}\) defined by \( F \begin{pmatrix} x \\ y \end{pmatrix} \rightarrow \begin{pmatrix} \nabla _1 f(x,y) \\ -\nabla _2 f(x,y)\end{pmatrix}\), where \(\nabla _1 f \), \(\nabla _2 f\) are the derivative of f with respect to the first and second component, respectively. Then, writing the optimal conditions for the inequalities in (21), we get
Moreover, F is monotone and also is Lipschitz continuous (if we impose Lipschitz continuous properties on \(\nabla _1 f, \nabla _2 f\) ), see Proposition 17.10 in [17] and Theorem 35.1 in [27]. In constrained case, the problem in (20) can be written as
where \(\mathcal {Q}\), \(\mathcal {Q}'\) are other real Hilbert spaces, \(K_1: \mathcal {H}\rightarrow \mathcal {Q}, K_2:\mathcal {G}\rightarrow \mathcal {Q}'\) are linear continuous operators with closed ranges such that \(b\in \mathcal {Q}, b'\in \mathcal {Q}'\) and \(\{x\in \mathcal {X} : K_1 (x) = b\}\ne \emptyset \), \(\{y\in \mathcal {Y} : K_2 (y) = b'\}\ne \emptyset \) are nonempty sets. This kind of the minimax constrained problem was studied with plenty of applications, for instance in [25]. In this case, we have the following characteristic of saddle points of (26)
Note that \(N_{C_1\cap C_2} = \partial (\delta _{C_1} + \delta _{C_2}) = N_{C_1} + N_{C_2}\), if a constraint qualification is satisfied, for example \( \emptyset \ne C_1 \cap {\text {int}} C_2 \) (see Corollary 16.38 in [17]). Then
Due to the above setting, the problem (28) turns into the monotone inclusion in (7). The iterative pattern in Algorithm 1 can be transformed into for each \(n\ge 0\) as
Notice that if G is a nonempty closed convex subset of \(\mathcal {H}\), then \(J_{N_{G}} = (Id+N_{G})^{-1} = prox_{\delta _{G}} = proj_G\) (see Example 23.4 in [17]), where \(proj_G\) is the projector (or projection operator) onto G, namely, for all \(x\in \mathcal {H}\), \(proj_G(x)\) is the unique element in G s.t. \(\Vert x - proj_G(x) \Vert := \min _{y\in G} \Vert x - y \Vert \). Therefore, we are able to construct a relevant algorithm for solving the monotone inclusion scheme in (32) (i.e., for solving problem (26)) demonstrated as:
Thus, we can use condition (ii) in (H) (in Section 1) to get the convergent results. We define \(\Psi : \mathcal {H} \times \mathcal {G} \rightarrow \mathbb {R}\),
which is a constraint qualification of our considered problem (see Corollary 16.38 in [17] and (27)-(28)).
Hence, we can propose the ergodic convergence results which follow from Theorem 4 as below.
Corollary 6
In the framework of (26), let \(\mathcal {X}\subset \mathcal {H}, \mathcal {Y}\subset \mathcal {G}\) be nonempty closed convex sets and the operators \(\bar{A}, \bar{B}, \bar{C}\) and \(\bar{D}\) be given as in (31) where \(\bar{A}\) is maximally monotone, \(\bar{D}\) is monotone and \(\eta ^{-1}\)-Lipschitz continuous with \(\eta > 0\), \(\bar{B}\) is monotone and \(\mu ^{-1}\)-Lipschitz continuous operator with \(\mu > 0\) (i.e., \(\mu ^{-1} := \bar{K}\), where \(\bar{K} = \max \{\Vert K_1\Vert ^2, \Vert K_2\Vert ^2\}, K_1\ne 0, K_2 \ne 0\)), \(\bar{C} :=zer \bar{B} \ne \emptyset \), and (Condition A) is satisfied. Consider the sequences generated by Algorithm 2, and the sequence \((z_n)_{n\in \mathbb {N}\cup \{0\}}\) defined as
where \((\lambda _{k})_{k\in \mathbb {N}\cup \{0\}}\) is a positive sequence such that \(\sum _{k\in \mathbb {N}\cup \{0\}} \lambda _{k} = +\infty \). If we suppose the following hypotheses:
and \(\limsup _{n\rightarrow +\infty } \left( \frac{\lambda _n \beta _n}{\mu } + \frac{\lambda _n}{\eta } \right) < \frac{1}{2}\), then \((z_n)_{n\in \mathbb {N}\cup \{0\}}\) converges weakly to a saddle point of (26).
Remark 2
1.
Considering the condition (i) in \((H^{\text {min}-\text {max}})\), notice that \(\bar{A}+ N_{\bar{C}}\) is maximally monotone if a regularity condition is fulfilled, for example (Condition A) (see Corollary 24.4 in [17]). In addition, (32) has a solution if the (Condition A) holds and (26) has saddle points.
2.
Notice that (ii) in \((H^{\text {min}-\text {max}})\) is fulfilled when \(\sum _{n\in \mathbb {N}\cup \{0\}} \frac{\lambda _n}{\beta _{n}} < + \infty \) and \(b=b'=0\), see Section 1.
3.
We can show the Lipschitz property of \(\bar{B}\) as follows:
where \(\bar{K} = \max \{\Vert K_1\Vert ^2, \Vert K_2\Vert ^2\}\).
5 The forward-backward-forward algorithm with extrapolation from the past and penalty scheme for the problem involving composition of linear continuous operators
In this section, we propose forward-backward-forward algorithm with extrapolation from the past and penalty scheme utilized to address the inclusion problem involving the finite sum of the compositions of monotone operators with linear continuous operators. To this end, we begin with the following problem:
Problem 3: Let \(\mathcal {H}\) be a real Hilbert space, \(A : \mathcal {H}\rightrightarrows \mathcal {H}\) be a maximally monotone operator, \(D : \mathcal {H}\rightarrow \mathcal {H}\) be a monotone and \(\nu \)-Lipschitz continuous operator with \(\nu >0\). Let m be strictly positive integer and for any \(i\in \{ 1,\dots ,m \}\), let \(\mathcal {G}_i\) be a real Hilbert space, \(B_i : \mathcal {G}_i \rightrightarrows \mathcal {G}_i\) be a maximally monotone operator, and \(L_i : \mathcal {H}\rightarrow \mathcal {G}_i\) be a nonzero linear continuous operator. Assume that \(B:\mathcal {H}\rightarrow \mathcal {H}\) is a monotone and \(\mu ^{-1}\)-Lipschitz continuous operator with \(\mu >0\), and suppose \(C = zerB\ne \emptyset \). The monotone inclusion problem to solve is
The monotone inclusion problem, Problem 3, can be reformulated into the same form as Problem 1 using the product space approach with a pertinent setting. To address this problem, we work with the product space \(\mathcal {H} \times \mathcal {G}_1 \times \dots \times \mathcal {G}_m\), where the inner product and associated norm are defined for every element \((x,v_1,\dots ,v_m)\) by \(\langle (x,v_1,\dots ,v_m), (y,w_1,\dots ,w_m) \rangle = \langle x, y \rangle + \sum _{i=1}^{m} \langle v_i, w_i \rangle \) and \(\Vert (x,v_1,\dots ,v_m)\Vert = \sqrt{\Vert x \Vert ^2 + \sum _{i=1}^{m}\Vert v_i \Vert ^2}\). We proceed to define the operators on the product space \(\mathcal {H} \times \mathcal {G}_1 \times \dots \times \mathcal {G}_m\) as follows: for every \((x,v_1,\dots ,v_m), (y,w_1,\dots ,w_m)\)\( \in \mathcal {H} \times \mathcal {G}_1 \times \dots \times \mathcal {G}_m\), \(\tilde{A}: \mathcal {H} \times \mathcal {G}_1 \times \dots \times \mathcal {G}_m \rightrightarrows \mathcal {H} \times \mathcal {G}_1 \times \dots \times \mathcal {G}_m\)
Note that the maximal monotonicity of A and \(B_i\), \(i=1,\dots ,m\), implies that the operator \(\tilde{A}\) is also maximally monotone, as stated in Proposition 20.22 and 20.23 of [17]. Moreover, according to Theorem 3.1 in [28], it is possible to demonstrate that the operator \(\tilde{D}\) is monotone and \(\beta \)-Lipschitz continuous where \(\beta = \nu + \sqrt{\sum _{i=1}^{m} \Vert L_i\Vert ^2 }\) (in this context, the monotonicity and Lipschitzian of D is a special case of [4, 28]). Moreover, we also have that \(\tilde{B}\) is monotone and \(\mu ^{-1}\)-Lipschitz continuous with
$$\begin{aligned} \tilde{C} := zer \tilde{B} := zer {B} \times \mathcal {G}_1 \times \dots \times \mathcal {G}_m = C \times \mathcal {G}_1 \times \dots \times \mathcal {G}_m , \end{aligned}$$
One can show (see [4]) that x is a solution to Problem 3 if and only if there exists \(v_1\in \mathcal {G}_1,\dots ,v_m\in \mathcal {G}_m\) such taht \((x,v_1,\dots ,v_m) \in zer(\tilde{A} + \tilde{D} + N_{\tilde{C}})\). On the other hand, when \((x,v_1,\dots ,v_m) \in zer(\tilde{A} + \tilde{D} + N_{\tilde{C}})\), then \(x\in zer(A + \sum _{i=1}^{m} L_i^{*} B_i L_i + D + N_C)\). This means that determining the zeros of \(\tilde{A} + \tilde{D} + N_{\tilde{C}}\) will automatically provide a solution to Problem 3.
By using the identity given in Proposition 23.16 of [17], which provides that for any \((x,v_1,\dots ,v_m)\in \mathcal {H} \times \mathcal {G}_1, \times \dots \times \mathcal {G}_m\) and \(\lambda > 0\), \(J_{\lambda \tilde{A}} (x,v_1,\dots ,v_m) = (J_{\lambda _n A}(x), J_{\lambda B_{1}^{-1}} (v_1),\dots ,J_{\lambda B_{m}^{-1}} (v_m))\), it can be observed that the iterations of Algorithm 1 are given for any \(n\in \mathbb {N}\cup \{0\}\) as follows:
To consider the convergence of this iterative scheme, we need the following additional hypotheses, similar to the hypotheses in Section 2 of [4]: \((H_{fitz}^{sum}){\left\{ \begin{array}{ll} (i) &{}A +N_C \;\text {is maximally monotone and} \; zer(A+\sum _{i=1}^{m} L^*B_i L+D+N_C)\ne \emptyset ;\\ (ii) &{}\text {For every } p\in {\text {ran}} N_C, \sum _{n\in \mathbb {N}\cup \{0\}} \lambda _n\beta _n [\sup \limits _{u\in C} \varphi _{B} (u,\frac{p}{\beta _n})-\sigma _C(\frac{p}{\beta _n})] < +\infty ;\\ (iii) &{}(\lambda _n)_{ n\in \mathbb {N}\cup \{0\} } \in \ell ^2 \setminus \ell ^1. \end{array}\right. }\)
The Fitzpatrick function of \(\tilde{B}\) (i.e., \(\varphi _{\tilde{B}}\) ) and the support function of \(\tilde{C}\) (i.e., \(\sigma _{\tilde{C}}\)) can be computed in the same way as demonstrated in Section 2 of [4] for arbitrary elements \((x_1,v_{1},\dots ,v_{m}), (x_1',v_{1}',\dots ,v_{m}')\in \mathcal {H} \times \mathcal {G}_1, \times \dots \times \mathcal {G}_m\). Moreover, the satisfaction of condition (i) in \((H_{fitz}^{sum})\) guarantees that \(\tilde{A} + N_{\tilde{C}}\) is maximally monotone and \(zer(\tilde{A}+\tilde{D}+N_{\tilde{C}})\ne \emptyset \). As a result, we can apply Theorems 4 and 5 to the problem of finding the zeros of \(\tilde{A}+\tilde{D}+N_{\tilde{C}}\). Thus, we can establish the convergence results for the scheme of the sum of monotone operators and linear continuous operators, which are given below.
Theorem 7
Let \((x_n)_{n\in \mathbb {N}\cup \{0\} }\) and \((z_n)_{n\in \mathbb {N}\cup \{0\} }\) be sequences generated by Algorithm 3. Assume \((H_{fitz}^{sum})\) is fulfilled and \(\limsup _{n\rightarrow +\infty } (\frac{\lambda _n \beta _n}{\mu } + \lambda _n \beta ) < \frac{1}{2}\), where
Then \((z_n)_{n\in \mathbb {N}\cup \{0\} }\) converges weakly to an element in \(zer(A + \sum _{i=1}^{m} L_i^{*} B_iL_i + D + N_C)\) as \(n\rightarrow \infty \). If, additionally, A and \(B^{-1}_i, i=1,\dots ,m\) are strongly monotone, then \((x_n)_{n\in \mathbb {N}\cup \{0\} }\) converges strongly to the unique element in \(zer(A + \sum _{i=1}^{m} L_i^{*} B_iL_i + D + N_C)\) as \(n\rightarrow \infty \).
Remark 3
In case \(m=1\), our considered problem will become a solver of the monotone inclusion problems involving compositions with linear continuous operators suggested in Section 3.2 of [5] corresponding to the context of the iterative method based on Tseng’s forward-backward-forward algorithm with extrapolation from the past and penalty scheme.
6 Convex minimization problem
Problem 4 Let \(\mathcal {H}\) be a real Hilbert space, \(f\in \Gamma (\mathcal {H})\), and \(h:\mathcal {H} \rightarrow \mathbb {R}\) be a convex and differentiable function with \(\nu \)-Lipschitz continous gradient for \(\nu >0\). Let m be a strictly positive integer, and for any \(i = 1,\dots ,m\), let \(\mathcal {G}_i\) be a real Hilbert space, \(g_i\in \Gamma (\mathcal {G}_i)\),
In order to solve this problem in our context, we can follow the Problem 3 in Section 5. We know that any element belonging to \(zer(\partial f + \sum _{i=1}^{m} L^*_i \partial g_i L_i + \nabla h + N_C)\) is an optimal solution for (39) if we substitute all of operators with
$$\begin{aligned}&A =\partial f, \; B = \nabla \Psi ,\; C = \arg \min \Psi = zer B,\; D = \nabla h, \;\text {and}\; \nonumber \\&B_i = \partial g_i, i = 1,\dots ,m, \end{aligned}$$
(40)
where \(B{ isamonotoneand}\mu ^{-1}{} \)-Lipschitz continuous operator (see Proposition 17.10 and Theorem 18.15 in [17]). However, the converse is true only if a suitable qualification condition is satisfied. Necessary conditions, namely that (39) leads to Problem 3 with (37), have been previously given in [4] (for example, see Proposition 4.3 and Remark 4.4 in [28])
Note that the strong quasi-relative interior for a convex set S in a real Hilbert space \(\mathcal {H}{} \) is denoted by sqri S and is defined as the set of points x\(\in \)S such that the union of all positive scalar multiples of \((S-x){ isaclosedlinearsubspaceof}\mathcal {H}{} \). Notice that sqri S always contains the interior of S, denoted by int S, but this inclusion can be strict. When \(\mathcal {H}{} \) is finite-dimensional, sqri S coincides with the relative interior of S, denoted by ri S, which is the interior of S with respect to its affine hull. Moreover, the condition in (41) is fulfilled if
dom \(g_i = \mathcal {G}_i, i = 1,\dots ,m\) or
\(\mathcal {H}{} { and}\mathcal {G}_i\) are finite dimensional and there exists x\(\in \) ri dom f\(\cap \) ri C such that \( L_i (x) \in { ridom}g_i, i=1,\dots ,m\) (see Proposition 4.3 of [28]).
At this point, we are equipped to state the iterative method based on Tseng’s forward-backward-forward algorithm with extrapolation from the past and penalty scheme and its convergent result.
For the hypothesis \((H^{opt})\), some observations have already been presented in [4], which are as follows:
Remark 4
1.
Due to Corollary 24.4 in [17], \(\partial f + N_C{ ismaximallymonotone},\,{ if}0\in {\text {sqri}} ({\text {dom}} f - C)\). This condition is fulfilled if the conditions in Proposition 6.19 and Proposition 15.24 of [17] hold, for instance, \(0\in {\text {int}} ({\text {dom}}f -C ){ or}f{ iscontinuousatapointin}{\text {dom}} f \cap C{ or}{\text {int}} C \cap {\text {dom}} f \ne \emptyset \).
2.
The condition (ii) of (H\(^{opt}{} \)) is similar to the condition (ii) of (H) which implies to the second contion of (H\(_{fitz}{} \)) as we mentined in the Section 1. Hence, we can apply our proposed convergent results in Section 3 for this context.
With the same effort as in Section 5, we may continue and provide the convergence results for Algorithm 4 as shown below.
Theorem 8
Let \((x_n)_{n\in \mathbb {N}\cup \{0\}}{} \) be the sequences generated by Algorithm 4 and \((z_n)_{n\in \mathbb {N}\cup \{0\}}{} \) be the sequence defined in (6). If \((H^{opt})\) and (41) are fulfilled and \(\limsup _{n\rightarrow +\infty } \left( \frac{\lambda _n \beta _n}{\mu } + \lambda _n \beta \right) < \frac{1}{2}{} \), where
then \((z_n)_{n\in \mathbb {N}\cup \{0\}}{} \) converges weakly to an optimal solution to (39) as \(n\rightarrow +\infty .{ If},\,{ additionally},\,f{ and}g_i^{*}, i =1,\dots ,m{ arestronglyconvex},\,{ then}(x_n)_{n\in \mathbb {N}\cup \{0\}}{} \) converges strongly to the unique optimal solution of the problem in (39) as \(n\rightarrow +\infty \).
Remark 5
1.
According to Proposition 17.10 and Theorem 18.15 in [17], \(g^*{ isstronglyconvexwhenever}g: \mathcal {H}\rightarrow \mathbb {R}{} \) is differentiable with Lipschitz continuous gradient.
2.
If \(\Psi (x) = 0{ forall}x\in \mathcal {H}{} \), then Algorithm 4 reduces to the error-free version of the iterative scheme proposed in [15] where all of the involved variable matrices are identity matrices. This scheme is used to solve the convex minimization problem given by
7 A numerical experiment in TV-based image inpainting
In this section, we demonstrate the application of Algorithm 4 for solving an image inpainting problem, which involves recovering lost information in an image. We compare the classical Tseng’s type (or the forward-backwards-forward (FBF) algorithm) penalty system, which Boț and Csetnek proposed in [4], with our modified Tseng’s type penalty scheme. Because our algorithm can reduce the redundant computed terms in the typical Tseng’s method (see introduction), then we expect to gain less time consumption in the next numerical experiment. The computations presented in this part were carried out with Python (version 3.7.9) on a Windows desktop computer powered by an Intel(R) Core(TM) i5-8250U processor that operated at speeds between 1.6 GHz and 1.8 GHz, and was equipped with 8.00 GB of RAM. We represent images of size \(M\times N{ asvectors}x\in \mathbb {R}^{\bar{n}},\,{ where}\bar{n} = M\cdot N,\,{ andeachpixeldenotedby}x_{i,j}, 1 \le i \le M, 1 \le j \le N,\,{ takesvaluesintheclosedintervalfrom}0 ({ pureblack}){ to}1 ({ purewhite}).{ Givenanimage}b\in \mathbb {R}^{\bar{n}}{} { withmissingpixels}({ whicharesettoblackinthiscase}),\,{ wedefine}P\in \mathbb {R}^{\bar{n}\times \bar{n}}{} { asthediagonalmatrixwith}P_{i,i}=0{ ifthe}i\)-th pixel in the noisy image \(b{ ismissing},\,{ and}P_{i,i} = 1{ otherwise},\,{ for}i = 1,\dots ,\bar{n} ({ notingthat}\Vert P\Vert =1\)). The original image is reconstructed by solving the following TV-regularized model:
The function \(TV_{iso}:\mathbb {R}^{\bar{n}}\rightarrow \mathbb {R}{} \), which we use as our objective function, is defined as the isotropic total variation:
It is possible to express the problem presented in (43) as an optimization problem of the form of Problem 4 in (39). This can be achieved by introducing the set \(\mathcal {Y}=\mathbb {R}^{\bar{n}}\times \mathbb {R}^{\bar{n}},\,{ anddefiningthelinearoperator}L:\mathbb {R}^{\bar{n}}\rightarrow \mathcal {Y}{} { as}L(x_{i,j}) = ( L^1 (x_{i,j}),L^2 (x_{i,j}) )\),where
where \(f: \mathbb {R}^{\bar{n}}\rightarrow \bar{\mathbb {R}}, f = \delta _{[0,1]^{\bar{n}}},\,g_1: \mathcal {Y}\rightarrow \mathbb {R}, g_1(y_1,y_2) = \Vert (y_1,y_2) \Vert _{\times }{} \). The problem given in (44) can be written as Problem 4 (39) when we set \(m=1, L_1=L\), and \(h=0\). It is worth noting that \(\nabla \Psi (x) = P(P(x)-b) = P(x-b)\) for all \(x\in \mathbb {R}^{\bar{n}}\), which makes \(\nabla \Psi \) Lipschitz continuous with a Lipschitz constant of \(\mu =1\). Therefore, the iterative scheme in Algorithm 4 can be written as follows for every \(n\ge 0\) in this particular case:
where \(S = \{ (p,q) \in \mathcal {Y} : \max _{\begin{array}{c} 1\le i \le M \\ 1 \le j \le N \end{array}} \sqrt{p_{i,j}^2 + q_{i,j}^2} \le 1 \}\). The projection operator \(proj_{S} : \mathcal {Y} \rightarrow S\) is defined via
$$\begin{aligned} (p_{i,j},q_{i,j}) \mapsto \frac{(p_{i,j}, q_{i,j})}{\max \{ 1, \sqrt{p_{i,j}^2 + q_{i,j}^2} \}},\; 1\le i \le M, \; 1\le j \le N. \end{aligned}$$
The quality of the reconstructed images was compared using the Improvement in Signal-to-Noise Ratio (ISNR), defined as follows:
where x, b and \(x_n\) denote the original, the image with missing pixels and the recovered image at iteration n, respectively.
Fig. 1
TV image inpainting was performed on four images, including the original image, an image with \(80\%\) of its pixels missing, the non-averaged reconstructed image denoted as \(x_n\), and the reconstructed image \(z_n\) obtained after 2000 iterations using FBF and FBF-EP with the different two values of \(\lambda _n\) in the first, second and the third column, respectively
The figure illustrates the ISNR curves for both the averaged and non-averaged reconstructed images using the FBF method with \(\lambda _n =0.9\cdot n^{-0.75}\) and FBF-EP method with \(\lambda _n =0.9 \cdot 2^{-1} \cdot n^{-0.75}\)
The figure illustrates the ISNR curves for both the averaged and non-averaged reconstructed images using the FBF and FBF-EP methods with the same value of \(\lambda _n =0.9 \cdot 2^{-1} \cdot n^{-0.75}\)
The table presents the results of ISNR and CPU time (in seconds) for both averaged and non-averaged reconstructed images obtained through 2000 iterations of the FBF and FBF-EP methods
FBF
FBF
FBF-EP
\(\lambda _n =0.9\cdot n^{-0.75}\)
\(\lambda _n =0.9 \cdot 2^{-1} \cdot n^{-0.75}\)
\(\lambda _n =0.9\cdot 2^{-1} \cdot n^{-0.75}\)
Average
Non-average
Average
Non-average
Average
Non-average
ISNR
11.41243
10.92022
11.36461
8.78285
11.39179
8.77024
CPU-time (sec)
91.59540
88.66624
91.33789
88.41237
77.89488
74.59753
We evaluated the performance of the algorithm on a \(256\times 256\) pixel image of the Pisa Tower, using the parameters \(\lambda _n = 0.9 \cdot n^{-0.75}\) (see [4]), \(\lambda _n = 0.9 \cdot 2^{-1}\cdot n^{-0.75}\) and \(\beta _n = n^{0.75}\) for all \(n \in \mathbb {N}\). The original image, as well as an image with \(80\%\) randomly blacked-out pixels, were used in the test. Figure 1 displays the original image, the noisy image, the non-averaged reconstructed image \(x_n\), and the averaged reconstructed image \(z_n\) after 2000 iterations using both the Forward-Backward-Forward (FBF) in the penalty scheme with \(\lambda _n = 0.9 \cdot n^{-0.75}\), FBF in penalty scheme with \(\lambda _n = 0.9 \cdot 2^{-1} \cdot n^{-0.75}\) and Forward-Backward-Forward with Extrapolation from the Past(FBF-EP) in penalty scheme with \(\lambda _n = 0.9 \cdot 2^{-1} \cdot n^{-0.75}\) . Note that \(\lambda _n = 0.9 \cdot n^{-0.75}\) can not be used for FBF-EP in penalty scheme because there is no theorem to support its convergence. In addition, since the parameters \(\lambda _n\) and \(\beta _n\) play a role in the methods as the stepsize of the algorithm, then we decided to choose these parameters, which have value close to the upper bound of their conditions (namely, either \(\limsup _{n\rightarrow +\infty } \left( \frac{\lambda _n \beta _n}{\mu } + \lambda _n \beta \right) < 1\) or \(\limsup _{n\rightarrow +\infty } \left( \frac{\lambda _n \beta _n}{\mu } + \lambda _n \beta \right) < \frac{1}{2}\)).
Figures 2 and 3 depict the evolution of the ISNR values for both the averaged and non-averaged reconstructed images using FBF and FBF-EP algorithms (in both two values of \(\lambda _n\) in FBF, respectively). The theoretical outcomes concerning the sequences involved in Theorem 8 are illustrated in the figure, indicating that the averaged sequence exhibits better convergence properties than the non-averaged sequence. We notice that the behavior of ISNR values in both algorithms carries out similarly when we use the same value of \(\lambda _n\) and the ISNR values of averaged FBF-EP have a bit outperform at 2000 iteration. Table 1 reveals that the FBF-EP approach produced the averaged reconstructed image with an ISNR value of 11.39179, while it was 11.36461 for the averaged case of the FBF with \(\lambda _n =0.9 \cdot 2^{-1} \cdot n^{-0.75}\) and 11.41243 for the averaged case of the FBF with \(\lambda _n =0.9 \cdot n^{-0.75}\). However, the non-averaged reconstructed image using the FBF-EP method had the lowest ISNR value of 8.77024. Furthermore, the FBF-EP method was faster than the FBF method by around 13 seconds (compared with both two various of \(\lambda _n\) in FBF), indicating that it can offer time-saving benefits in image reconstruction.
Declarations
Conflicts of Interest
The author declare that they have no conflict of interest.
Ethical Approval
Not Applicable.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Boț, R.I., Csetnek, E.R.: A Tseng’s type penalty scheme for solving inclusion problems involving linearly composed and parallel-sum type monotone operators. Vietnam J. Math. 42, 451–465 (2013). https://doi.org/10.1007/s10013-013-0050-2
Böhm, A., Sedlmayer, M., Csetnek, E.R., Boț, R.I.: Two steps at a time—taking GAN training in stride with Tseng’s method. SIAM J. Math. Data Sci. 4, 750–771 (2022). https://doi.org/10.1137/21M1420939
Popov, L.D.: A modification of the Arrow-Hurwicz method for search of saddle points. Mathematical Notes of the Academy of Sciences of the USSR. 28, 845–848 (1980). https://doi.org/10.1007/BF01141092CrossRefMATH
17.
Bauschke, H.H., Combettes, P.L.: Convex Analysis and Monotone Operator Theory in Hilbert Spaces. CMS Books in Mathematics, Springer, Berlin (2011)CrossRefMATH
18.
Fitzpatrick, S.: Representing monotone operators by convex functions. Proc. Centre Math. Appl. 59–65 (1988)
19.
Bauschke, H.H., McLaren, D.A., Sendov, H.S.: Fitzpatrick functions: inequalities, examples, and remarks on a problem by S. Fitzpatrick. J. Convex Anal. 13, 499–523 (2006)MathSciNetMATH
20.
Rockafellar, R.T.: On the maximal monotonicity of subdifferential mappings. Pacific J. Math. 33, 209–216 (1970)MathSciNetCrossRefMATH
Zhang, G.: Understanding minimax optimization in modern machine learning. UWSpace. http://hdl.handle.net/10012/17157 (2021). Accessed 12 Apr 2024
27.
Rockafellar, R.T.: Convex Analysis. Princeton University Press, New Jersey (1970)CrossRefMATH
28.
Combettes, P.L., Pesquet, J.-C.: Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators. Set-Valued Var. Anal. 20, 307–330 (2012). https://doi.org/10.1007/s11228-011-0191-yMathSciNetCrossRefMATH