1 Introduction
Regularization theory constitutes a powerful framework for the derivation of algorithms for supervised learning [
14,
41,
42]. Given a series of data points
\(({\varvec{x}}_m, y_m) \in {\mathbb {R}}^d \times {\mathbb {R}}\),
\(m=1,\dots ,M\), the basic problem (regression) is to find a mapping
\(f: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\) such that
\(f({\varvec{x}}_m)\approx y_m\), without overfitting. The standard paradigm is to let
f be the minimizer of a cost that consists of a data-fidelity term and an additive regularization functional [
8]. The minimization proceeds over a prescribed class
\({\mathcal {H}}\) of candidate functions. One usually distinguishes between the parametric approaches (e.g., neural networks), where
\({\mathcal {H}}={\mathcal {H}}_{\varTheta }\) is a family of functions specified by a finite set of parameters
\(\theta \in \varTheta \) (e.g., the weights of the network), and the nonparametric ones, where the properties of the solution are controlled by the regularization functional. The focus of this paper is on the nonparametric techniques. They rely on
functional optimization, which means that the minimization proceeds over a space of functions rather than over a set of parameters. The regularization is usually chosen to be an increasing function of the norm associated with a particular Banach space, which results in a well-posed problem [
9,
10,
56].
The functional-optimization point of view is often constructive, in that it suggests or supports explicit learning architectures. For instance, the choice of the Hilbertian regularization
\(R(f)=\Vert f\Vert ^2_{{\mathcal {H}}}\) where
\({\mathcal {H}}\) is a reproducing-kernel Hilbert space (RKHS) results in a closed-form solution that is a linear combination of kernels positioned on the data [
7,
62]. In fact, the RKHS setting yields a generic class of estimators that is compatible with the classical kernel-based methods of machine learning, including support vector machines [
1,
41,
49,
50,
56,
62]. Likewise, adaptive kernel methods are justifiable from the minimization of a generalized total-variation norm, which favors sparse representations [
3,
11,
12]. These latter results actually take their root in spline theory [
18,
28,
60]. Similarly, it has been demonstrated that shallow ReLU networks are solutions of functional-optimization problems with an appropriate regularization. One way to achieve this is to start from an explicit parameterization of an infinite-width network [
4] (the reverse engineering/synthesis approach). Another way is to consider a regularization operator that is matched to the neuronal activation with a
\(L_1\)-type penalty
1; for instance, a second-order derivative for
\(d=1\) [
36,
48] or, more generally, the Radon-domain counterpart of the Laplace operator whose Green’s function is precisely a ReLU ridge [
35,
37,
57]. Similar optimality results can be stated within the framework of reproducing-kernel Banach spaces [
6], which is a formal point of view that bridges the synthesis and analysis approach of [
4] and [
37], respectively. Also relevant to the discussion is a variational formulation that links the ridgelet transform to the training of shallow neural networks with weight-decay regularization [
53].
The second important benefit of the functional-optimization approach is that it gives insight on the approximation capabilities (expressivity) of the resulting learning architectures. This information is encapsulated in the definition of the native space
\({\mathcal {H}}\) (typically, a Sobolev space), which goes hand-in-hand with the regularization functional. Roughly speaking, the native space
\({\mathcal {H}}\) ought to be “large enough” to allow for the approximation of any continuous function with an arbitrary degree of precision. This universal approximation property is a central theme in the theory of radial-basis functions (RBFs) [
31,
63]. In machine learning, the kernel estimators that meet this approximation requirement are called
universal [
32]. When the basis functions are shifted replicates of a single template
\(h: {\mathbb {R}}^d \rightarrow {\mathbb {R}}\), then the condition is equivalent to
h being strictly positive definite, which means that its Fourier transform is real-valued symmetric, and (strictly) positive [
13]. Similar guarantees of universal approximation exist for (shallow) neural networks under mild conditions on the activation functions [
5,
16,
25,
30,
39]. The main difference with the RKHS framework, however, is that the universality results for neural nets usually make the assumption that the input domain is a compact subset of
\({\mathbb {R}}^d\).
The purpose of this paper is to unify and extend these various approaches by introducing a universal regularization functional. The latter has two components: an admissible differential operator
\(\textrm{L}\), and an
\(L_p\)-type Radon-domain norm. The resulting regularization operator is
\(\textrm{L}_{\textrm{R}}={\textrm{K}}_{\textrm{rad}}\text {RL}\), where
\({\textrm{R}}\) is the Radon transform and
\({\textrm{K}}_{\textrm{rad}}\) the “filtering” operator of computer tomography [
33]. Our main result (Theorem
5) gives the parametric form of the solution of the corresponding functional-optimization problems under minimal hypotheses. For
\(p=2\), the outcome is compatible with the type of kernel expansions (RBFs) of classical machine learning for which there is a vast literature [
24,
52]. For
\(p=1\), the solution set is parameterized by a neural network with one hidden layer whose activation function is determined by the regularization operator. In particular, if we take
\(\textrm{L}\) to be the Laplacian, then one retrieves the popular ReLU activation. Remarkably, the connection with neural networks also works the other way round: Parhi et al. [
36,
38] could prove that the training of a shallow ReLU neural network that is sufficiently wide, with weight-decay regularization, converges to the solution of a functional-optimization problem that is a special instance of the class considered in this paper.
The foundation for our characterization is an abstract representer theorem for direct-sum Banach spaces [
58]. Thus, the primary effort in this paper consists in the development of a functional framework that is adapted to the Radon transform and that fulfills the hypotheses of the abstract theorem. The main contributions can be summarized as follows.
1.
Construction and characterization of an extended family of native Banach spaces
\({\mathcal {X}}'_{{\textrm{L}}_{\textrm{R}}}({\mathbb {R}}^d)\) associated with a generic Radon-domain norm
\(\Vert \cdot \Vert _{{\mathcal {X}}'}\) and a differential operator
\(\textrm{L}\), under the general admissibility conditions stated in Definition
3 (Theorem
6).
2.
Proof that: (i) the sampling functionals
\(\delta (\cdot -{\varvec{x}}_m): {\mathcal {X}}'_{{\textrm{L}}_{\textrm{R}}}({\mathbb {R}}^d) \rightarrow {\mathbb {R}}\) are weak*-continuous; and (ii) the adjoint of the regularization operator has a stable generalized inverse
\(\textrm{L}^{*\dagger }_{\textrm{R}}\) (see Theorem
7 and accompanying explanations). These technical points are essential to the argumentation (existence of solution).
3.
Extension and unification of a number of earlier optimality results for RBF expansions and neural networks. While the present setup for
\(p=2\) and
\(\textrm{L}=(-\varDelta )^{\gamma }\) is reminiscent of thin-plate splines [
17,
29], the resulting solution for a fixed
\(\gamma \) does not depend on the dimension
d, which makes it easier to deploy. Likewise, our variational formulation with
\({\mathcal {X}}'={\mathcal {M}}\) extends the results of Parhi and Nowak [
37] by: (i) proving that the neural network parameterization applies to
all the extreme points of the solution set, and (ii) by covering a much broader class of activation functions, including those with polynomial growth (of degree
\(n_0\)).
4.
General guarantees of universality, subject to the admissibility condition in Definition
3. While the result for
\(p=2\) is consistent with the known criteria for kernel estimators [
32], its counterpart for neural networks
\(({\mathcal {X}}'={\mathcal {M}})\) brings in a new twist: the addition of a polynomial component. The latter, which is not present in the traditional theory [
5,
39], is necessary to lift the hypothesis of a compact input domain. The two cases of greatest practical relevance are the sigmoid and the ReLU activations which, in our formulation, require the addition of a bias (
\(n_0=0\)) and an affine term
\((n_0=1)\), respectively. Interestingly, the ReLU case yields a neural architecture with a skip connection akin to ResNet [
22], which is highly popular in practice.
The paper is organized as follows: We start with mathematical preliminaries in Sect.
2. In particular, we state our criteria of admissibility for
\(\textrm{L}\) and show how to represent its polynomial null space. In Sect.
3, we review the main properties of the Radon transform and specify the dual pair
\(({\mathcal {X}}_{\textrm{Rad}}, {\mathcal {X}}'_{\textrm{Rad}})\) of hyper-spherical Banach spaces that enter the definition of our native spaces. We also provide formulas for the (filtered) Radon transform of RBFs and ridges (the elementary constituents of neural networks). Section
4 is devoted to the description and interpretation of our main result (Theorem
5). In particular, we draw a connection with RKHS in Sect.
4.2. We discuss the issue of universality in Sect.
4.3 and show in Sect.
4.4 how our framework can be extended to handle anti-symmetric activations, including sigmoids. We complement our exposition in Sect.
4.5 with a listing of specific configurations, many of which are intimately connected to splines. The mathematical developments that support our formulation are presented in Sect.
5. They include the characterization of the kernel of the inverse operator
\({\textrm{L}}^{*\dagger }_{\textrm{R}}\) —the enabling ingredient of our formulation— and the construction of the predual Banach space
\({\mathcal {X}}_{{\textrm{L}}_{\textrm{R}}}({\mathbb {R}}^d)\).
The Radon transform extracts the integrals of a function on
\({\mathbb {R}}^d\) over all hyperplanes of dimension
\((d-1)\). These hyperplanes are indexed over
\({\mathbb {R}}\times {\mathbb {S}}^{d-1}\), where
\({\mathbb {S}}^{d-1}=\{{\varvec{\xi }}\in {\mathbb {R}}^d: \Vert {\varvec{\xi }}\Vert _2=1\}\) is the unit sphere in
\({\mathbb {R}}^d\). The coordinates of a hyperplane associated with an offset
\(t\in {\mathbb {R}}\) and a normal vector
\({\varvec{\xi }}\in {\mathbb {S}}^{d-1}\) satisfy
$$\begin{aligned} {\varvec{\xi }}^{\textsf{T}}{\varvec{x}}=\xi _1x_1+ \dots + \xi _d x_d = t. \end{aligned}$$
Here, we first review the classical theory of the Radon transform [
27], starting with the case of test functions (Sect.
3.1), and extending it by duality to tempered distributions (Sect.
3.2). Then, in Sect.
3.3, we specify the Radon transform and its inverse on an appropriate class of intermediate Banach spaces
\({\mathcal {Y}}\) with
(Theorem
3). Finally, in Sect.
3.4, we provide the (filtered) Radon transforms of the dictionary elements—isotropic kernels and ridges—that are relevant to our investigation.
The Radon transform of the function
\(f\in L_1({\mathbb {R}}^d)\cap C_0({\mathbb {R}}^d)\) is defined as
$$\begin{aligned} {\textrm{R}}\{ f\}(t, {\varvec{\xi }})&=\int _{{\mathbb {R}}^d}\delta (t-{\varvec{\xi }}^{\textsf{T}}{\varvec{x}}) f({\varvec{x}}) \textrm{d}{\varvec{x}},\quad (t,{\varvec{\xi }}) \in {\mathbb {R}}\times {\mathbb {S}}^{d-1}. \end{aligned}$$
(18)
The adjoint of
\({\textrm{R}}\) is the backprojection operator
\({\textrm{R}}^*\). Its action on
\(g: {\mathbb {R}}\times {\mathbb {S}}^{d-1} \rightarrow {\mathbb {R}}\) yields the function
$$\begin{aligned} {\textrm{R}}^*\{g\}({\varvec{x}})=\int _{{\mathbb {S}}^{d-1}} g(\underbrace{{\varvec{\xi }}^{\textsf{T}}{\varvec{x}}}_{t}, {\varvec{\xi }})\textrm{d}{\varvec{\xi }}, \quad {\varvec{x}}\in {\mathbb {R}}^d. \end{aligned}$$
(19)
Given the
d-dimensional Fourier transform
\({{\hat{f}}}\) of
\(f \in L_1({\mathbb {R}}^d)\), one can calculate
\( {\textrm{R}} \{f\}(\cdot ,{\varvec{\xi }}_0)\) for any fixed
\({\varvec{\xi }}_0 \in {\mathbb {S}}^{d-1}\) through the relation
$$\begin{aligned} {\textrm{R}}\{ f\}(t, {\varvec{\xi }}_0)=\frac{1}{2 \pi } \int _{{\mathbb {R}}} {{\hat{f}}}(\omega {\varvec{\xi }}_0) \textrm{e}^{\textrm{i}\omega t} \textrm{d}\omega = {\mathcal {F}}_{\textrm{1D}}^{-1}\{ {{\hat{f}}}(\cdot {\varvec{\xi }}_0) \}\{t\}, \end{aligned}$$
(20)
In other words, the restriction of
\({{\hat{f}}}: {\mathbb {R}}^d \rightarrow {\mathbb {C}}\) along the ray
\(\{{\varvec{\omega }}=\omega {\varvec{\xi }}_0: \omega \in {\mathbb {R}}\}\) coincides with the 1D Fourier transform of
\({\textrm{R}}\{ f\}(\cdot , {\varvec{\xi }}_0)\), a property that is referred to as the
Fourier-slice theorem.
To describe the functional properties of the Radon transform, one needs the (hyper)spherical (or Radon-domain) counterparts of the spaces described in Sect.
2.1. There, the Euclidean indexing with
\({\varvec{x}} \in {\mathbb {R}}^d\) must be replaced by
\((t, {\varvec{\xi }}) \in {\mathbb {R}}\times {\mathbb {S}}^{d-1}\).
The spherical counterpart of
\({\mathcal {S}}({\mathbb {R}}^d)\) is
\({\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\). Correspondingly, an element
\(g \in {\mathcal {S}}'({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) is a continuous linear functional on
\({\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) whose action on the test function
\(\phi \) is represented by the duality product
\(g: \phi \mapsto \langle g,\phi \rangle _{\textrm{Rad}}\). When
g can be identified as an ordinary function
\(g: (t,{\varvec{\xi }}) \mapsto g(t,{\varvec{\xi }})\in {\mathbb {R}}\), one can write that
$$\begin{aligned} \langle g,\phi \rangle _{\textrm{Rad}} = \int _{{\mathbb {S}}^{d-1}} \int _{{\mathbb {R}}} g(t, {\varvec{\xi }}) \phi (t, {\varvec{\xi }}) \textrm{d}t \textrm{d}{\varvec{\xi }}\end{aligned}$$
(21)
where
\(\textrm{d}{\varvec{\xi }}\) stands for the surface element on the unit sphere
\({\mathbb {S}}^{d-1}\).
The key property for analysis is that the Radon transform is continuous on
\({\mathcal {S}}({\mathbb {R}}^d)\) and invertible [
23,
27,
43]. In addition to a backprojection, the inversion involves the so-called filtering operator.
The filtering operator is isotropic LSI and, as such, has a Radon-domain counterpart (see Definition
5) denoted by
\({\textrm{K}}_{\textrm{rad}}\) that exclusively acts along the radial variable.
Based on this result, we can identify the filtering operator as \({\textrm{K}}=({\textrm{R}}^*{\textrm{R}})^{-1}=c_d(-\varDelta )^{\tfrac{d-1}{2}}\) (fractional Laplacian). Alternatively, one can perform the filtering in the Radon domain by means of the operator \({\textrm{K}}_{\textrm{rad}}\), which implements a 1D convolution along the radial variable. The connection is that the frequency response of \({\textrm{K}}_{\textrm{rad}}\) coincides with the radial frequency profile of \({\textrm{K}}\) so that \({{\widehat{K}}}({\varvec{\omega }})={{\widehat{K}}}_{\textrm{rad}}(\Vert {\varvec{\omega }}\Vert )\) with \({{\widehat{K}}}_{\textrm{rad}}(\omega )=c_d|\omega |^{d-1}\).
While the Radon transform
\({\textrm{R}}: {\mathcal {S}}({\mathbb {R}}^d) \rightarrow {\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) is invertible, it is not surjective, which means that not every hyper-spherical test function
\(\phi \in {\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) can be written as
\(\phi ={\textrm{R}}\{\varphi \}\) with
\(\varphi \in {\mathcal {S}}({\mathbb {R}}^d)\). A necessary condition is that
\(\phi \) be even, but this is not sufficient [
20,
23,
27]. The good news, however, is that the range of
\({\textrm{R}}\) on
\({\mathcal {S}}({\mathbb {R}}^d)\) is a closed subspace of
\({\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) [
23, p. 60]. Accordingly, one can identify the range space
\({\mathcal {S}}_{\textrm{Rad}}{\mathop {=}\limits ^{\vartriangle }}{\textrm{R}}\big ({\mathcal {S}}({\mathbb {R}}^d) \big )\) equipped with the Fréchet topology of
\({\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\). Since the domain and range spaces are both Fréchet, we then invoke the open-mapping theorem [
45, Theorem 2.11] to deduce that the transform
\(\varphi \mapsto {\textrm{R}}\{\varphi \}\) is a homeomorphism of
\({\mathcal {S}}({\mathbb {R}}^d)\) onto
\({\mathcal {S}}_{\textrm{Rad}}\).
3.2 Distributional Extension
To extend the framework to distributions, one proceeds by duality. By invoking the property that
\({\textrm{R}}^*{\textrm{K}}_{\textrm{rad}}{\textrm{R}}=\textrm{Id}\) on
\({\mathcal {S}}({\mathbb {R}}^d)\), we make the manipulation
$$\begin{aligned} \forall \varphi \in {\mathcal {S}}({\mathbb {R}}^d)\quad \langle f,\varphi \rangle&=\langle f,{\textrm{R}}^*{\textrm{K}}_{\textrm{rad}}{\textrm{R}}\{\varphi \} \rangle \nonumber \\&= \langle {\textrm{R}}\{f\},{\textrm{K}}_{\textrm{rad}}{\textrm{R}}\{ \varphi \}\rangle _{\textrm{Rad}} =\langle {\textrm{R}}\{f\}, \phi \rangle _{\textrm{Rad}}, \end{aligned}$$
(24)
with
\(\phi ={\textrm{K}}_{\textrm{rad}}{\textrm{R}}\{\varphi \} \in {\textrm{K}}_{\textrm{rad}}{\textrm{R}}\big ({\mathcal {S}}({\mathbb {R}}^d)\big )\) and
\(\varphi ={\textrm{R}}^*\{\phi \}\). Relation (
24), which is valid in the classical sense for
\(f \in L_1({\mathbb {R}}^d)\), is then used as definition to extend the scope of
\({\textrm{R}}\) for
\(f\in {\mathcal {S}}'({\mathbb {R}}^d)\).
While (
27) identifies
\({\textrm{R}}^*\{g\}\) as a single, unique distribution in
\({\mathcal {S}}'({\mathbb {R}}^d)\), this is not so for (
26) (resp., (
25)), as the members of
\({\mathcal {S}}'_{\textrm{Rad}}\) (resp., of
\(\big ( {\textrm{K}}_{\textrm{rad}}{\textrm{R}}\big ( {\mathcal {S}})\big )'\)) are equivalence classes in
\({\mathcal {S}}'({\mathbb {R}}\times {\mathbb {S}}^{d-1})\). To make this explicit, we take advantage of the equivalence
\({\textrm{R}}^*\{g\}=0 \Leftrightarrow \langle g, \phi \rangle _{\textrm{Rad}}=0\) to identity the null space of the backprojection operator as being
$$\begin{aligned} {\mathcal {N}}_{{\textrm{R}}^*}=\{g \in {\mathcal {S}}'({\mathbb {R}}\times {\mathbb {S}}^{d-1}): \langle g, \phi \rangle _{\textrm{Rad}}=0, \forall \phi \in {\mathcal {S}}_{\textrm{Rad}}\}, \end{aligned}$$
(28)
which is a closed subspace of
\({\mathcal {S}}'({\mathbb {R}}\times {\mathbb {S}}^{d-1})\). It is then possible to describe
\({\mathcal {S}}'_{\textrm{Rad}}\) as the abstract quotient space
\({\mathcal {S}}'({\mathbb {R}}\times {\mathbb {S}}^{d-1})/{\mathcal {N}}_{{\textrm{R}}^*}\). In other words, if we find a hyper-spherical distribution
\(g_0\in {\mathcal {S}}'({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) such that (
26) is met for a given
\(f \in {\mathcal {S}}'({\mathbb {R}}^d)\), then, strictly speaking,
\({\textrm{K}}_{\textrm{rad}}{\textrm{R}}\{f\} \in {\mathcal {S}}'_{\textrm{Rad}}\) is the equivalence class (or coset) given by
$$\begin{aligned} {\textrm{K}}_{\textrm{rad}}{\textrm{R}}\{f\}=[g_0]=\{g_0 + h: h \in {\mathcal {N}}_{{\textrm{R}}^*}\}. \end{aligned}$$
(29)
Since
\([g_0]=[g]\) for any
\(g\in {\textrm{K}}_{\textrm{rad}}{\textrm{R}}\{f\}\), we refer to the members of
\({\textrm{K}}_{\textrm{rad}}{\textrm{R}}\{f\}\) as “formal” filtered projections of
f to remind us of this lack of unicity.
Based on those definitions, one obtains the classical result on the invertibility of the (filtered) Radon transform on
\({\mathcal {S}}'({\mathbb {R}}^d)\) [
27], which is the dual of Corollary
1.
To illustrate the fact that (
26) does not identify a single distribution, we consider the Dirac ridge
\(\delta ({\varvec{\xi }}_0{\varvec{x}} - t_0) \in {\mathcal {S}}'({\mathbb {R}}^d)\) and refer to the definition (
18) of the Radon transform to deduce that, for all
\(\phi ={\textrm{R}}\{\varphi \} \in {\mathcal {S}}_{\textrm{Rad}}\) with
\(\varphi \in {\mathcal {S}}({\mathbb {R}}^d)\),
$$\begin{aligned} \langle \delta ({\varvec{\xi }}_0^{\textsf{T}}\cdot - t_0),{\textrm{R}}^*{\textrm{K}}_{\textrm{rad}}\{\phi \}\rangle&=\langle \delta ({\varvec{\xi }}_0^{\textsf{T}}\cdot - t_0), \overbrace{{\textrm{R}}^*{\textrm{K}}_{\textrm{rad}}{\textrm{R}}}^{\textrm{Id}}\{\varphi \}\rangle \\&= \int _{{\mathbb {R}}^d}\delta ({\varvec{\xi }}_0^{\textsf{T}}{\varvec{x}}-t_0) \varphi ({\varvec{x}}) \textrm{d}{\varvec{x}}={\textrm{R}}\{\varphi \}(-t_0,-{\varvec{\xi }}_0)\\&=\langle \delta \big (\cdot +(t_0,{\varvec{\xi }}_0)\big ),{\textrm{R}}\{\varphi \}\rangle _{\textrm{Rad}}=\langle \delta \big (\cdot +(t_0,{\varvec{\xi }}_0)\big ),\phi \rangle _{\textrm{Rad}}, \end{aligned}$$
which shows that
\(\delta \big (\cdot +{\varvec{z}}_0\big )\) with
\({\varvec{z}}_0=(t_0,{\varvec{\xi }}_0)\) is a formal filtered projection of
\(\delta ({\varvec{\xi }}_0^{\textsf{T}}{\varvec{x}} - t_0)\). Moreover, since
\(\delta ({\varvec{\xi }}_0^{\textsf{T}}{\varvec{x}} - t_0)=\delta (-{\varvec{\xi }}_0^{\textsf{T}}{\varvec{x}} +t_0)\), the same holds true for
\(\delta (\cdot -{\varvec{z}}_0)\), as well as for
\(\delta _{\textrm{Rad},{\varvec{z}}_0}{\mathop {=}\limits ^{\vartriangle }}\frac{1}{2} \big (\delta (\cdot -{\varvec{z}}_0)+\delta (\cdot +{\varvec{z}}_0)\big )\), which has the advantage of being symmetric. While the general solution in
\({\mathcal {S}}'_{\textrm{Rad}}\) is
\({\textrm{K}}_{\textrm{rad}}{\textrm{R}}\{\delta ({\varvec{\xi }}_0^{\textsf{T}}\cdot - t_0)\}=[\delta \big (\cdot \pm {\varvec{z}}_0\big )]\), we shall see that there also exists a way to specify a representer that is unique (i.e.,
\(\delta _{\textrm{Rad},{\varvec{z}}_0})\) by restricting the range of
\({\textrm{K}}_{\textrm{rad}}{\textrm{R}}\) to a suitable subspace of measures.
The distributional extension of the Radon transform inherits most of the properties of the “classical” operator defined in (
18). Of special relevance to us is the quasi-commutativity of
\({\textrm{R}}\) with convolution, also known as the
intertwining property. Specifically, let
\(h,f \in {\mathcal {S}}'({\mathbb {R}}^d)\) be two distributions whose convolution
\(h *f\) is well-defined in
\({\mathcal {S}}'({\mathbb {R}}^d)\). Then,
$$\begin{aligned} {\textrm{R}}\{ h *f\}={\textrm{R}}\{ h\} \circledast {\textrm{R}}\{ f\}\end{aligned}$$
(30)
where the symbol “
\(\circledast \)” denotes the 1D convolution along the radial variable
\(t \in {\mathbb {R}}\) with
\((u \circledast g)(t,{\varvec{\xi }})=\langle u(\cdot ,{\varvec{\xi }}),g(t-\cdot ,{\varvec{\xi }}) \rangle \). In particular, when
\(h=\textrm{L}\{\delta \}\) is the (isotropic) impulse response of an LSI operator whose frequency response
\({{\widehat{L}}}({\varvec{\omega }})={{\widehat{L}}}_{\textrm{rad}}(\Vert {\varvec{\omega }}\Vert )\) is purely radial, we get that
$$\begin{aligned} {\textrm{R}}\{ h *f\}=\text {RL}\{f\}=\textrm{L}_{\textrm{rad}}{\textrm{R}}\{f\}, \end{aligned}$$
(31)
where
\(\textrm{L}_{\textrm{rad}}\) is the corresponding Radon-domain operator of Definition
5. Likewise, by duality, for
\(g \in {\mathcal {S}}'({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) we have that
$$\begin{aligned} \text {LR}^*\{g\}={\textrm{R}}^*\textrm{L}_{\textrm{rad}}\{g\}, \end{aligned}$$
(32)
under the implicit assumption that
\(\textrm{L}\{{\textrm{R}}^*g\}\) and
\(\textrm{L}_{\textrm{rad}}\{g\}\) are well-defined distributions. By taking inspiration from Theorem 1, we can then use these relations for
\(\textrm{L}={\textrm{K}}=({\textrm{R}}^*{\textrm{R}})^{-1}\) to show that
\({\textrm{R}}^*{\textrm{K}}_{\textrm{rad}} {\textrm{R}}\{f\}={\textrm{R}}^*\text {RK}\{f\}=\text {KR}^*{\textrm{R}}\{f\}=f\) for a broad class of distributions. The first form is valid for all
\(f \in {\mathcal {S}}'({\mathbb {R}}^d)\) (Theorem
2), but there is a slight restriction with the second form (resp., third form), which requires that
\({\textrm{K}}\{f\}\) (resp.,
\({\textrm{K}}\{g\}\) with
\(g={\textrm{R}}^*{\textrm{R}}\{f\} \in {\mathcal {S}}'({\mathbb {R}}^d)\)) be well-defined in
\({\mathcal {S}}'({\mathbb {R}}^d)\). While the latter condition is always met when
d is odd, it may fail
2 in even dimensions with distributions (e.g., polynomials) whose Fourier transform is singular at the origin. The good news for our regularization framework is that these problematic distributions are either excluded from the native space or annihilated by
\(\textrm{L}\), so that it is legitimate to write that
\(\textrm{L}_{\textrm{R}}={\textrm{K}}_{\textrm{rad}}\text {RL}=\text {RKL}\), where the second form has the advantage that
\({\textrm{K}}\) and
\(\textrm{L}\) can be pooled into a single augmented operator
\((\text {KL})\). An alternative form is
\(\textrm{L}_{\textrm{R}}={\textrm{Q}}_{\textrm{rad}}{\textrm{R}}\), where
\({\textrm{Q}}_{\textrm{rad}}={\textrm{K}}_{\textrm{rad}}{\textrm{L}}_{\textrm{rad}}\) is the radial Radon-domain operator whose frequency response is
\({{\widehat{Q}}}_{\textrm{rad}}(\omega )=c_d |\omega |^{d-1} {{\widehat{L}}}_{\textrm{rad}}(\omega )\).
3.3 Radon-Compatible Banach Spaces
Our formulation requires the identification of Radon-domain Banach spaces over which the backprojection operator
\({\textrm{R}}^*\) is invertible. This is a nontrivial point because the extended operator
\({\textrm{R}}^*: {\mathcal {S}}'({\mathbb {R}}\times {\mathbb {S}}^{d-1}) \rightarrow {\mathcal {S}}'({\mathbb {R}}^d)\) in Definition
6 is not injective. In fact, it has the highly nontrivial null space
\(\textrm{ker}({\textrm{R}}^*)={\mathcal {S}}^\perp _{\textrm{Rad}}\), which is a superset of the odd Radon-domain distributions [
20]. Yet,
\({\textrm{R}}^*\) is invertible on
\({\mathcal {S}}'_{\textrm{Rad}}\) and surjective on
\({\mathcal {S}}'({\mathbb {R}}^d)\) (Theorem
2).
To ensure invertibility, we therefore need to restrict ourselves to Banach spaces that are embedded in
\({\mathcal {S}}'_{\textrm{Rad}}\). To identify such objects, we consider a generic Banach space
\({\mathcal {X}}=({\mathcal {X}}, \Vert \cdot \Vert _{{\mathcal {X}}})\) such that
. This dense-embedding hypothesis has several implications:
1.
The space
\({\mathcal {X}}\) is the completion of
\({\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) in the
\(\Vert \cdot \Vert _{{\mathcal {X}}}\) norm, i.e.,
$$\begin{aligned} {\mathcal {X}}=\overline{\big ({\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1}), \Vert \cdot \Vert _{{\mathcal {X}}}\big )}. \end{aligned}$$
(33)
2.
The dual space
is equipped with the norm
$$\begin{aligned} \Vert g\Vert _{{\mathcal {X}}'}=\sup _{\phi \in {\mathcal {X}}:\; \Vert \phi \Vert _{{\mathcal {X}}}\le 1} \langle g, \phi \rangle =\sup _{\phi \in {\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1}):\; \Vert \phi \Vert _{{\mathcal {X}}}\le 1} \langle g, \phi \rangle , \end{aligned}$$
(34)
where the restriction of
\(\phi \in {\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) on the right-hand side of (
34) is justified by the denseness of
\({\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) in
\({\mathcal {X}}\).
3.
The definition of
\(\Vert g\Vert _{{\mathcal {X}}'}\) given by the right-hand side of (
34) is valid for any distribution
\(g\in {\mathcal {S}}'({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) with
\(\Vert g\Vert _{{\mathcal {X}}'}=\infty \) for
\(g \notin {\mathcal {X}}'\). Accordingly, we can specify the topological dual of
\({\mathcal {X}}\) as
$$\begin{aligned} {\mathcal {X}}'=\big \{g \in {\mathcal {S}}'({\mathbb {R}}\times {\mathbb {S}}^{d-1}): \Vert g\Vert _{{\mathcal {X}}'}< \infty \big \}. \end{aligned}$$
(35)
Likewise, based on the pair
\(({\mathcal {S}}_{\textrm{Rad}},{\mathcal {S}}'_{\textrm{Rad}})\), we specify the Radon-compatible Banach subspaces
$$\begin{aligned} {\mathcal {X}}_{\textrm{Rad}}&=\overline{({\mathcal {S}}_{\textrm{Rad}},\Vert \cdot \Vert _{{\mathcal {X}}})} \end{aligned}$$
(36)
$$\begin{aligned} {\mathcal {X}}'_{\textrm{Rad}}&=\big ({\mathcal {X}}_{\textrm{Rad}}\big )'=\big \{g \in {\mathcal {S}}'_{\textrm{Rad}}: \Vert g\Vert _{{\mathcal {X}}'_{\textrm{Rad}}}< \infty \big \} \end{aligned}$$
(37)
where the underlying dual norms have a definition that is analogous to (
34) with
\({\mathcal {S}}_{\textrm{Rad}}\) and
\({\mathcal {X}}_{\textrm{Rad}}\) substituting for
\({\mathcal {S}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\) and
\({\mathcal {X}}\).
The prototypical examples where those properties are met are \(({\mathcal {X}}, {\mathcal {X}}')=\big (L_p({\mathbb {R}}\times {\mathbb {S}}^{d-1}),L_q({\mathbb {R}}\times {\mathbb {S}}^{d-1})\big )\) with \(p\in (1,\infty )\) and \(q=p/(p-1)\) (conjugate exponent), as well as \(({\mathcal {X}}, {\mathcal {X}}')=\big (C_0({\mathbb {R}}\times {\mathbb {S}}^{d-1}),{\mathcal {M}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\big )\). In fact, those hyper-spherical spaces have the convenient feature of admitting a decomposition in their even and odd components.
Correspondingly, we get that \({\mathcal {X}}'_{\textrm{Rad}}={\textrm{P}}_{\textrm{even}}({\mathcal {X}}')={\mathcal {X}}'_{\textrm{even}}\) and \(({\mathcal {X}}^{\textrm{c}}_{\textrm{Rad}})'=(\textrm{Id}-{\textrm{P}}_{\textrm{even}})({\mathcal {X}}')={\mathcal {X}}'_{\textrm{odd}}\), with the cases of greatest interest to us being \({\mathcal {M}}_{\textrm{Rad}}={\mathcal {M}}_{\textrm{even}}({\mathbb {R}}\times {\mathbb {S}}^{d-1}) \) and \(L_{2,\mathrm Rad}=L_{2,\textrm{even}}({\mathbb {R}}\times {\mathbb {S}}^{d-1})\).
The Fourier-slice theorem expressed by (
20) remains valid for tempered distributions [
43] and therefore also yields a characterization of
\({\textrm{R}}\{ f\}\) that is compatible with the Banach framework of Theorem
3. It is especially helpful when the underlying function
\(\rho _{\textrm{iso}}\) is isotropic with a known radial frequency profile
\({{\widehat{\rho }}}_{\textrm{rad}}\) such that
\( {\mathcal {F}}\{ \rho _{\textrm{iso}}\}({\varvec{\omega }})={{\widehat{\rho }}}_{\textrm{rad}}(\Vert {\varvec{\omega }}\Vert )\).
The other important building blocks for representing functions are ridges. Specifically, a ridge is a multidimensional function
$$\begin{aligned} r_{{\varvec{\xi }}_0}:{\mathbb {R}}^d \rightarrow {\mathbb {R}}: {\varvec{x}} \mapsto r({\varvec{\xi }}_0^{\textsf{T}}{\varvec{x}}) \end{aligned}$$
(43)
that is characterized by a profile
\(r: {\mathbb {R}}\rightarrow {\mathbb {R}}\) and a direction
\({\varvec{\xi }}_0 \in {\mathbb {S}}^{d-1}\). In effect,
\(r_{{\varvec{\xi }}_0}\) varies along the axis specified by
\({\varvec{\xi }}_0\) and is constant within any hyperplane perpendicular to
\({\varvec{\xi }}_0\). The connection between ridges and the Radon transform is given by the
ridge identity$$\begin{aligned} \forall \varphi \in {\mathcal {S}}({\mathbb {R}}^d): \quad \langle r_{{\varvec{\xi }}_0}, \varphi \rangle =\langle r, {\textrm{R}}\{\varphi \} (\cdot ,{\varvec{\xi }}_0)\rangle , \end{aligned}$$
(44)
where the right-hand side duality product (1D) is well-defined for any
\(r \in {\mathcal {S}}'({\mathbb {R}})\) because
\({\textrm{R}}\{\varphi \} (\cdot ,{\varvec{\xi }}_0) \in {\mathcal {S}}({\mathbb {R}})\) (by Theorem
1). When the profile
\(r: {\mathbb {R}}\rightarrow {\mathbb {R}}\) is locally integrable, (
44) is a simple consequence of Fubini’s theorem. For more general distributional profiles
\(r \in {\mathcal {S}}'({\mathbb {R}})\), we use the ridge identity as definition, which then leads to the following characterization [
57].
An important special case of (
46) is the Radon transform of a Dirac ridge:
\({\textrm{K}}_{\textrm{rad}}{\textrm{R}} \{\delta ({\varvec{\xi }}_0^{\textsf{T}}\cdot -t_0) \}= \delta _{\textrm{Rad},(t_0, {\varvec{\xi }}_0)}= \frac{1}{2} \big (\delta (\cdot -t_0)\delta (\cdot -{\varvec{\xi }}_0) + \delta (\cdot +t_0)\delta (\cdot +{\varvec{\xi }}_0)\big )\), which has already been mentioned in Sect.
3.2 (see also [
35, Example 1]).
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.