Journal of Visual Communication and Image Representation
A comparison of three total variation based texture extraction models☆
Introduction
Let f be an observed image that contains texture and/or noise. Texture is characterized as repeated and meaningful structure of small patterns. Noise is characterized as uncorrelated random patterns. The rest of an image, which is called cartoon, contains object hues and sharp edges (boundaries). Thus an image f can be decomposed as f = u + v, where u represents image cartoon and v is texture and/or noise. A general way to obtain this decomposition using the variational approach is to solve the problem min{TV(u)| ∥u − f∥B ⩽ σ}, where TV(u) denotes the total variation of u and ∥·∥B is a norm (or semi-norm). The total variation of u, which is defined below in Section 1.1, is minimized to regularize u while keeping edges like object boundaries of f in u (i.e., while allowing discontinuities in u). The fidelity term ∥u − f∥B ⩽ σ forces u to be close to f.
The Banach space BV of functions of bounded variation is important in image processing because such functions are allowed to have discontinuities and hence keep edges. This can be seen from its definition as follows.
Let u ∈ L1, and define [39]as the total variation of u, where |·| denotes the Euclidean norm. Also, u ∈ BV if . In the above definition, , the set of continuously differentiable vector-valued functions, serves as a test set for u. If u is in the Sobolev spaces W1,1 or H1, it follows from integration by parts that TV(u) is equal to ∫|∇u|, where ∇u is the weak derivative of u. However, the use of test functions to define TV(u) allows u to have discontinuities. Therefore, BV is larger than W1,1 and H1. Equipped with the |·|BV-norm, BV is a Banach space.
BV(Ω) with Ω being a bounded open domain is defined analogously to BV with L1 and replaced by L1(Ω) and , respectively.
Next, we define the space G [27]. Let G denote the Banach space consisting of all generalized functions v(x) defined on , which can be written as
Its norm ∥v∥G is defined as the infimum of all L∞ norms of the functions over all decompositions (1) of f. In short, . G is the dual of the closed subspace of BV, where [27]. We note that finite difference approximations to functions in BV and are the same. For the definition and properties of G(Ω), see [5].
An immediate result of the above definitions is thatholds for any with compact support and v ∈ G. We say (u, v) is an extremal pair if (2) holds with equality.
In image processing, the space BV and the total variation semi-norm were first used by Rudin, Osher, and Fatemi [33] to remove noise from images. Specifically, their model obtains a cleaner image u ∈ BV of a noisy image f by letting u be the minimizer of , in which the regularizing term TV(u) tends to reduce the oscillations in u and the data fidelity term tends to keep u close to f.
The ROF model is the precursor to a large number of image processing models having a similar form. Among the recent total variation-based cartoon-texture decomposition models, Meyer [27] and Haddad and Meyer [20] proposed using the G-norm defined above, Vese and Osher [35] approximated the G-norm by the div(Lp)-norm, Osher, Sole and Vese [32] proposed using the H−1-norm, Lieu and Vese [26] proposed using the more general H−s-norm, and Le and Vese [24] and Garnett, Le, Meyer and Vese [18] proposed using the homogeneous Besov space , −2 < s < 0, 1 ⩽ p, q ⩽ ∞, extending Meyer’s , to model the oscillation component of an image. In addition, Chan and Esedoglu [12] and Yin, Goldfarb and Osher [38] used the L1-norm together with total variation, following the earlier work by Alliney [2], [3], [4] and Nikolova [29], [30], [31].
In this subsection we present three cartoon-texture decomposition models that are based on the minimization of total variation. We suggest that readers interested in the theoretical analysis of these models read the referenced works mentioned below and in the introduction. Although the analysis of the existence and uniqueness of solutions and duality/conjugacy is not within the scope of our discussion, in Section 3 we relate the differences among the image results from these models to the distinguished properties of the three fidelity terms: , ∥f − u∥G, and its approximation by Vese and Osher.
In the rest of the paper, we assume the input image f has compact support contained in a bounded convex open set Ω. In our tests, Ω is an open square.
In [2], [3], [4], [30], [31], [12], [37] the square of the L2 norm of f − u in the fidelity term in the original ROF model is replaced by the L1 norm of f − u, which yields the following problem:
The above constrained minimization problem (3) is equivalent to its Lagrangian relaxed form (4), where λ is the Lagrange multiplier of the constraint ∫|f − u|. The two problems have the same solution if λ is chosen equal to the optimal value of the dual variable corresponding to the constraint in the constrained problem. Given σ or λ, we can calculate the other value by solving the corresponding problem. The same result also holds for Meyer’s model below.
We chose to solve the Lagrangian relaxed version (4), rather than the constraint version (3), in our numerical experiments because several researchers [12], [37] have established the relationship between λ and the scale of f − u∗. For example, for the unit disk signal 1B(0,r) centered at origin and with radius r, f − u∗ = 1B(0,r) for 0 < λ < 2/r while f − u∗ vanishes for λ > 2/r. Although this model appears to be simpler than Meyer’s model and the Vese–Osher model below, it has recently been shown to have very interesting properties like morphological invariance and texture extraction by scale [12], [37]. These properties are important in various applications in biomedical engineering and computer vision such as background correction [36], face recognition [14], [15], and brain MR image registration [13]. In Section 3, we demonstrate the ability of the TV-L1 model to separate out features of a certain scale in an image.
In addition to the SOCP approach that we use in the paper to solve (4) numerically, the graph-based approaches [17], [11] were recently shown to be very efficient in solving an approximate version of (4).
To extract cartoon u in the space BV and texture and/or noise v as an oscillating function, Meyer [27] proposed the following model:
As we have pointed out in Section 1.1, G is the dual space of , a subspace of BV. So G is closely connected to BV. Meyer gave a few examples in [27], including the one shown at the end of next paragraph, illustrating the appropriateness of modeling oscillating patterns by functions in G.
Unfortunately, it is not possible to write down Euler–Lagrange equations for the Lagrangian form of Meyer’s model (6), and hence, use a straightforward partial differential equation method to solve it. Alternatively, several models [5], [6], [32], [35] have been proposed to solve (6) approximately. The Vese–Osher model [35] described in the next subsection approximates by , with 1 ⩽ p < ∞. The Osher–Sole–Vese model [32] replaces ∥v∥G by the Hilbert functional . The more recent A2BC model [5], [7], [6] is inspired by Chambolle’s projection algorithm [10] and minimizes for (u,v) ∈ BV × {v ∈ G:∥v∥G ⩽ μ}. Similar projection algorithms proposed in [9], [8] are also used to approximately solve (4) and (6). Recently, Kindermann and Osher [21] showed that (6) is equivalent to a minimax problem and proposed a numerical method to solve this saddle-point problem. Other numerical approaches based on the dual representation of the G-norm are introduced in [16] by Chung, Le, Lieu, Tanushev, and Vese, [25] by Lieu, and [23] by Le, Lieu, and Vese. In [34], Starck, Elad, and Donoho use sparse basis pursuit to achieve a similar decomposition. In Section 2, we present SOCP-based optimization models to solve both (5) and (6) exactly (i.e., without any approximation or regularization applied to the non-smooth terms ∫|∇u| and ∥v∥G except for the use of finite differences). In contrast to our choice for the TV-L1 model, we chose to solve (5) with specified σ’s in our numerical experiments because setting an upper bound on ∥f − u∥G is more meaningful than penalizing ∥f − u∥G. The following example demonstrates that ∥v∥G is inversely proportional to the oscillation of v: let v(t) = cos(xt), which has stronger oscillations for larger t; one can show ∥v∥G = 1/t because and . Therefore, to separate a signal with oscillations stronger than a specific level from f, it is more straightforward to solve the constrained problem (5).
To calculate the G-norm of a function f alone, one can choose to solve an SOCP or use the dual method by Kindermann, Osher and Xu [22]. The authors of the latter work exploit (2) to develop a level-set based iterative method.
Motivated by the definition of the L∞ norm of as the limit
Vese and Osher [35] proposed the following approximation to Meyer’s model (5):where p ⩾ 1.
In , minimizing VOp with respect to yields the associated Euler–Lagrange equations:
In [35], the authors solve the above system of partial differential equations for different values of p, with 1 ⩽ p ⩽ 10, via a sequential descent approach and claim that they give very similar numerical results.
The VO model (8) can be viewed as a relaxation of Meyer’s model (5) since the requirement f − u = div(g) is relaxed by penalizing its violation and . This point is clearly illustrated by the numerical comparisons between these two models presented in Section 3.
The purpose of this paper is to accurately compute and compare the three TV-based models presented above using a uniform approach. To do this we formulate and solve all of the above three models as second-order cone programs (SOCPs). These formulations do not require the use of regularization to handle the non-smoothness of these models. In this subsection, we give a short introduction to SOCP and the use of interior-point methods to solve SOCPs.
In an SOCP the vector of variables is composed of subvectors – i.e., x ≡ (x1;x2; … ;xr) – where n = n1 + n2 + … + nr and each subvector xi must lie either in an elementary second-order cone of dimension nior an ni-dimensional rotated second-order cone
Note is an elementary second-order cone under a linear transformation; i.e.,
With these definitions an SOCP can be written in the following form [1]:where and , for i = 1, … , r and .
Since a one-dimensional second-order cone corresponds to a semi-infinite ray, SOCPs can accommodate nonnegative variables. In fact if all cones are one-dimensional, then the above SOCP is just a standard form linear program. As is the case for linear programs, SOCPs can be solved in polynomial time by interior point methods. This is the approach that we take to solve the TV-based cartoon-texture decomposition models in this paper.
Over the past two decades there has been extensive research on and development of the interior-point methods for solving linear programs. In the last few years this research and development has been extended to SOCPs. Consequently, these problems can now be solved efficiently both in practice and in theory (in polynomial time). Moreover, interior-point SOCP methods often yield highly accurate solutions. The optimality conditions for the SOCP (13) are
Interior-point methods for SOCPs approximately solve a sequence of perturbed optimality conditions (the “0” on the right-hand side of the third block in (14), which is equal to the duality gap, is replaced by a positive scalar μ) by taking single damped Newton steps (Δx1,…,ΔxrΔy,Δs1,…,Δsr) while making sure that the new iterates remain interior (i.e., xi + Δxi and si + Δsi are strictly inside their respective cones). The iteration stops once a new iterate satisfies certain prescribed stopping conditions such as the duality gap μ falling below a tolerance. The typical number of iterations is between 15 and 50 and is usually fairly independent of the problem size. Moreover, in [19] it is shown that each interior-point iteration takes O(n3) time and O(n2log n) bytes for solving an SOCP formulation of the Rudin–Osher–Fatemi model [33].
Section snippets
Preliminaries
In practice grey-scale images are represented as 2-dimensional matrices, whose elements give the grey-scale values of corresponding pixels. In this paper we restrict our discussion to square domains in and hence n × n real matrices (denoted by Mn×n) for the sake of simplicity.
Let f ∈ Mn×n be an observed image and let u denote the cartoon and v denote the texture and/or noise in f such that (f, u, v) satisfies
fi,j, ui,j, and vi,j are, respectively, the grey-scale values
Numerical results
In this section, we present numerical results for the three cartoon-texture decomposition models and compare them. In all cases we solved the Lagrangian version (4) of the TV-L1 model, the constraint version (5) of Meyer’s model, and the residual-free version (45) of the Vese–Osher (VO) model (8) with p = 1.
We used the commercial optimization package Mosek [28] as our SOCP solver. Mosek is designed to solve a variety of large-scale optimization problems, including SOCPs. Before solving a
Conclusions
In this work, we have computationally studied three total variation based models with discrete inputs: the Meyer, VO, and TV-L1 models. We showed how to formulate these models as second-order cone programs, which we solved by an interior-point method. We tested these models using a variety of 1D signals and 2D images to reveal their differences in decomposing inputs into their cartoon and oscillating/small-scale/texture parts. The Meyer model tends to capture the pattern of the oscillations in
References (39)
- et al.
Nonlinear total variation based noise removal algorithms
Physica D
(1992) - et al.
Second-order cone programming
Mathematical Programming
(2003) Digital filters as absolute norm regularizers
IEEE Transactions on Signal Processing
(1992)Recursive median filters of increasing order: a variational approach
IEEE Transactions on Signal Processing
(1996)A property of the minimum vectors of a regularizing functional defined by means of the absolute norm
IEEE Transactions on Signal Processing
(1997)- et al.
Modeling very oscillating signals, application to image processing
Applied Mathematics and Optimization
(2005) - J.F. Aujol, G. Aubert, L. Blanc-Feraud, A. Chambolle, Decomposing an image: application to textured images and SAR...
- et al.
Image decomposition into a bounded variation component and an oscillating component
Journal of Mathematical Imaging and Vision
(2005) - et al.
Dual norms and image decomposition models
International Journal of Computer Vision
(2005) - et al.
Structure-texture image decomposition - modeling, algorithms, and parameter selection
International Journal of Computer Vision
(2006)
An algorithm for total variation minimization and applications
Journal of Mathematical Imaging and Vision
Aspects of total variation regularized L1 function approximation
SIAM Journal on Applied Mathematics
Total variation models for variable lighting face recognition
IEEE Transactions of Pattern Analysis and Machine Intelligence (PAMI)
Cited by (62)
Structure-aware adaptive bilateral texture filtering
2022, Digital Signal Processing: A Review JournalCartoon and texture decomposition for color image in opponent color space
2022, Applied Mathematics and ComputationCitation Excerpt :Image decomposition is an important problem in image processing and computer vision [1,4,5,12,26–28,35,41,45] where a natural image is regarded as a superposition of cartoon component and texture component.
Infrared dim target detection based on total variation regularization and principal component pursuit
2017, Image and Vision ComputingCitation Excerpt :In other words, TV norm represents the smoothness of a given image. It is also widely used in image decomposition [25], which can decompose an image into two parts: one is uncorrelated random patterns, while the other is sharp edges and piecewise-smooth components [26]. By minimizing the TV norm of an image, the smooth inner surface will be preserved while retaining crisp edges.
Sparse coding based orientation estimation for latent fingerprints
2017, Pattern RecognitionA multiscale image segmentation method
2016, Pattern RecognitionCitation Excerpt :He proposed that the cartoon component should be modeled by using the total variation (TV) norm [27], and the oscillating texture part or noise should be modeled by using the G-norm, which is widely investigated with numerous numerical implementations and applications [28–33]. Yin et al. proposed the TV-L1 model (the model using the total variation regularization and an L1 fidelity term) for decomposing an image into the features of different scales [34,35], and compared several models for image texture extraction in [36]. They established the equivalence between the TV-L1 model and certain geometric optimization problems, which was used to show that this model decomposes an image into components of different scales.
- ☆
Research supported by NSF Grants DMS-01-04282, DNS-03-12222 and ACI-03-21917, ONR Grants N00014-03-1-0514 and N00014-03-0071, and DOE Grant GE-FG01-92ER-25126.