In this article, we analyze tensor approximation schemes for continuous functions. We assume that the function to be approximated lies in an isotropic Sobolev space and discuss the cost when approximating this function in the continuous analogue of the Tucker tensor format or of the tensor train format. We especially show that the cost of both approximations are dimension-robust when the Sobolev space under consideration provides appropriate dimension weights.
Hinweise
Communicated by Wolfgang Dahmen.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
The efficient approximate representation of multivariate functions is an important task in numerical analysis and scientific computing. In this article, we hence consider the approximation of functions which live on the product of m bounded domains \(\varOmega _1\times \dots \times \varOmega _m\), each of which satisfies \(\varOmega _j\subset {\mathbb {R}}^{n_j}\). Besides a sparse grid approximation of the function under consideration, being discussed in [8, 17, 18, 50], one can also apply a low-rank approximation by means of a tensor approximation scheme, see, e.g., [15, 21, 22, 34, 35] and the references therein.
The low-rank approximation in the situation of the product of \(m=2\) domains is well understood. It is related to the singular value decomposition and has been studied for arbitrary product domains in [19, 20], see also [46‐48] for the periodic case. However, the situation is not that clear for the product of \(m>2\) domains, where one ends up with tensor decompositions. Such tensor decompositions are generalizations of the well-known singular value decomposition and the corresponding low-rank matrix approximation methods of two dimensions to the higher-dimensional setting. There, besides the curse of dimension, we encounter—due to the nonexistence of an Eckart–Young–Mirsky theorem—that the concepts of singular value decomposition and low-rank approximation can be generalized to higher dimensions in more than one way. Consequently, there exist many generalizations of the singular value decomposition of a function and of low-rank approximations to tensors. To this end, various schemes have been developed over the years in different areas of the sciences and have successfully been applied to various high-dimensional problems ranging from quantum mechanics and physics via biology and econometrics, computer graphics and signal processing to numerical analysis. Recently, tensor methods have even been recognized as special deep neural networks in machine learning and big data analysis [11, 28]. As tensor approximation schemes, we have, for example, matrix product states, DMRG, MERA, PEPS, CP, CANDECOMP, PARAFAC, Tucker, tensor train, tree tensor networks and hierarchical Tucker, to name a few. A mathematical introduction into tensor methods is given in the seminal book [21], while a survey on existing methods and their literature can be found in [16]. Also, various software packages have been developed for an algebra of operators dealing with tensors.
Anzeige
Tensor methods are usually analyzed as low-rank approximations to a full discrete tensor of data with respect to the \(\ell _2\)-norm or Frobenius norm. In this respective, they can be seen as compression methods which may avoid the curse of dimensionality, which is inherent in the full tensor representation. The main tool for studying tensor compression schemes is the so-called tensor-rank, compare [12, 13, 21]. Thus, instead of \({\mathcal {O}}(N^n)\) storage, as less as \({\mathcal {O}}(nNr^3)\) or even only \({\mathcal {O}}(nNr^2)\) storage is needed, where N denotes the number of data points in one coordinate direction, n denotes the dimension of the tensor under consideration, and r denotes the respective tensor rank of the data. The cost complexity of the various algorithms working with sparse tensor representations is correspondingly reduced, and working in a sparse tensor format allows to alleviate or to completely break the curse of dimension for suitable tensor data classes, i.e., for sufficiently small r.
However, the question where the tensor data stem from and the issue of the accuracy of the full tensor approximation, i.e., the discretization error of the full tensor itself and its relation to the error of a subsequent low-rank tensor approximation, are usually not adequately addressed.1 Instead, only the approximation property of a low-rank tensor scheme with respect to the full tensor data is considered. But the former question is important since it clearly makes no sense to derive a tensor approximation with an error that is substantially smaller than the error which is already inherent in the full tensor data due to some discretization process for a continuous high-dimensional function which stems from some certain function class.
The approximation rates to continuous functions can be determined by a recursive use of the singular value decomposition, which is successively applied to convert the function into a specific continuous tensor format. We studied the singular value decomposition for arbitrary domains in [19, 20], and we now can apply these results to discuss approximation rates of continuous tensor formats. In the present article, given a function \(f\in H^k(\varOmega _1\times \dots \times \varOmega _m)\), we study the continuous analogues of the Tucker tensor decomposition and of the tensor train decomposition. We give bounds on the ranks required to ensure that the tensor decomposition admits a prescribed target accuracy. Particularly, our analysis takes into account the influence of errors induced by truncating infinite expansions to finite ones. We therefore study an algorithm that computes the desired tensor expansion which is in contrast to the question of the smallest tensor rank. We finally show that (isotropic) Sobolev spaces with dimension weights help to beat the curse of dimension when the number m of product domains tends to infinity.
Besides the simple situation of \(\varOmega _1 = \dots = \varOmega _m = [0,1]\), which is usually considered in case of tensor decompositions, there are many more applications of our general setting. For example, non-Newtonian flow can be modeled by a coupled system which consists of the Navier–Stokes equation for the flow in a three-dimensional geometry described by \(\varOmega _1\) and of the Fokker–Planck equation in a \(3(\ell -1)\)-dimensional configuration space \(\varOmega _2\times \dots \times \varOmega _{\ell }\), consisting of \(\ell -1\) spheres. Here, \(\ell \) denotes the number of atoms in a chain-like molecule which constitutes the non-Newtonian behavior of the flow, for details, see [4, 29, 31, 37]. Another example is homogenization. After unfolding [10], a two-scale homogenization problem gives raise to the product of the macroscopic physical domain and the periodic microscopic domain of the cell problem, see [32]. For multiple scales, several periodic microscopic domains appear which reflect the different separable scales, see, e.g. [27]. Also, the m-th moment of linear elliptic boundary value problems with random source terms, i.e., \(Au(\omega )=f(\omega )\) in \(\varOmega \), is known to satisfy a deterministic partial differential equation on the m-fold product domain \(\varOmega \times \dots \times \varOmega \). There, the solution’s m-th moment \({\mathcal {M}}_u\) is given by the equation
see [40, 41]. This approach extends to boundary value problems with random diffusion and to random domains as well [9, 25]. Moreover, we find the product of several domains in quantum mechanics for the Schrödinger equation or the Langevin equation, where each domain is three-dimensional and corresponds to a single particle. Finally, we encounter it in uncertainty quantification, where one has the product of the physical domain \(\varOmega _1\) and of in general infinitely many intervals \(\varOmega _2 = \varOmega _3 = \varOmega _4 = \dots \) for the random input parameter, which reflects its series expansion by the Karhunen–Lòeve decomposition or the Lévy–Ciesielski decomposition.
Anzeige
The remainder of this article is organized as follows: In Sect. 2, we give a short introduction to our results on the singular value decomposition, which are needed to derive the estimates for the continuous tensor decompositions. Then, in Sect. 3, we study the continuous Tucker tensor format, computed by means of the higher-order singular value decomposition. Next, we study the continuous tensor train decomposition in Sect. 4, computed by means of a repeated use of a vector-valued singular value decomposition. Finally, Sect. 5 concludes with some final remarks.
Throughout this article, to avoid the repeated use of generic but unspecified constants, we denote by \(C \lesssim D\) that C is bounded by a multiple of D independently of parameters which C and D may depend on. Obviously, \(C \gtrsim D\) is defined as \(D \lesssim C\), and \(C \sim D\) as \(C \lesssim D\) and \(C \gtrsim D\). Moreover, given a Lipschitz-smooth domain \(\varOmega \subset {\mathbb {R}}^n\), \(L^2(\varOmega )\) means the space of square integrable functions on \(\varOmega \). For real numbers \(k\ge 0\), the associated Sobolev space is denoted by \(H^k(\varOmega )\), where its norm \(\Vert \cdot \Vert _{H^k(\varOmega )}\) is defined in the standard way, compare [33, 45]. As usual, we have \(H^0(\varOmega ) = L^2(\varOmega )\). The seminorm in \(H^k(\varOmega )\) is denoted by \(|\cdot |_{H^k(\varOmega _1)}\). Although not explicitly written, our subsequent analysis covers also the situation of \(\varOmega \) being not a domain but a (smooth) manifold.
2 Singular Value Decomposition
2.1 Definition and Calculation
Let \(\varOmega _1\subset {\mathbb {R}}^{n_1}\) and \(\varOmega _2\) be Lipschitz-smooth domains. To represent functions \(f\in L^2(\varOmega _1\times \varOmega _2)\) on the tensor product domain \(\varOmega _1\times \varOmega _2\) in an efficient way, we will consider low-rank approximations which separate the variables \({\varvec{x}}\in \varOmega _1\) and \({\varvec{y}}\in \varOmega _2\) in accordance with
It is well known (see, e.g., [30, 38, 42]) that the best possible representation (2.1) in the \(L^2\)-sense is given by the singular value decomposition, also called Karhunen–Lòeve expansion.2 Then, the coefficients \(\sqrt{\lambda (\alpha )}\in {\mathbb {R}}\) are the singular values, and the \(\varphi (\alpha )\in L^2(\varOmega _1)\) and \(\psi (\alpha )\in L^2(\varOmega _2)\) are the left and right (\(L^2\)-normalized) eigenfunctions of the integral operator
is the adjoint of \({\mathcal {S}}_f\). Particularly, the left and right eigenfunctions \(\{\varphi (\alpha )\}_{\alpha =1}^\infty \) and \(\{\psi (\alpha )\}_{\alpha =1}^\infty \) form orthonormal bases in \(L^2(\varOmega _1)\) and \(L^2(\varOmega _2)\), respectively.
In order to compute the singular value decomposition, we need to solve the eigenvalue problem
It holds \(\lambda (\alpha ) = {\widetilde{\lambda }}(\alpha )\), and the sequences \(\{\varphi (\alpha )\}\) and \(\{\psi (\alpha )\}\) are related by (2.2).
2.2 Regularity of the Eigenfunctions
Now, we consider functions \(f\in H^k(\varOmega _1\times \varOmega _2)\). In the following, we collect results from [19, 20] concerning the singular value decomposition of such functions. We repeat the proof whenever needed for having explicit constants. To this end, we define the mixed Sobolev space \(H_{{\text {mix}}}^{k,\ell }(\varOmega _1\times \varOmega _2)\) as a tensor product of Hilbert spaces
Note that we have used here \(H_{{\text {mix}}}^{0,-k}(\varOmega _1\times \varOmega _2) = L^2(\varOmega _1)\otimes H^{-k}(\varOmega _2)\). Proceeding likewise for \({\mathcal {S}}_f^\star : L^2(\varOmega _2)\rightarrow H^k(\varOmega _1)\) completes the proof. \(\square \)
Lemma 2
Assume that \(f\in H^k(\varOmega _1\times \varOmega _2)\) for some fixed \(k\ge 0\). Then, it holds \({\mathcal {S}}_f\varphi (\alpha )\in H^k(\varOmega _2)\) and \({\mathcal {S}}_f^\star \psi (\alpha )\in H^k(\varOmega _1)\) for all \(\alpha \in {\mathbb {N}}\) with
We will show later in Lemma 4 how to improve this estimate by sacrificing some regularity.
2.3 Truncation Error
We next give estimates on the decay rate of the eigenvalues of the integral operator \({\mathcal {K}}_f = {\mathcal {S}}_f^\star {\mathcal {S}}_f\) with kernel (2.4). To this end, we exploit the smoothness in the function’s first variable and assume hence \(f\in H_{{\text {mix}}}^{k,0}(\varOmega _1 \times \varOmega _2)\). We introduce finite element spaces \(U_r\subset L^2(\varOmega _1)\), which consist of r discontinuous, piecewise polynomial functions of total degree \(\lceil k\rceil \) on a quasi-uniform triangulation of \(\varOmega _1\) with mesh width \(h_r\sim r^{-1/n_1}\). Then, given a function \(w\in H^k(\varOmega _1)\), the \(L^2\)-orthogonal projection \(P_r:L^2(\varOmega _1)\rightarrow U_r\) satisfies
uniformly in \(r\) due to the Bramble–Hilbert lemma, see, e.g., [6, 7].
For the approximation of \(f({\varvec{x}},{\varvec{y}})\) in the first variable, i.e., \(\big ((P_r\otimes I)f\big )({\varvec{x}},{\varvec{y}})\), we obtain the following approximation result for the present choice of \(U_r\), see [20] for the proof.
Lemma 3
Assume that \(f\in H^k(\varOmega _1\times \varOmega _2)\) for some fixed \(k\ge 0\). Let \(\lambda (1)\ge \lambda (2)\ge \ldots \ge 0\) be the eigenvalues of the operator \({\mathcal {K}}_f={\mathcal {S}}_f^\star {\mathcal {S}}_f\) and \(\lambda _r(1)\ge \lambda _r(2)\ge \ldots \ge \lambda _r(r)\ge 0\) those of \({\mathcal {K}}_f^{\,r}:= P_r{\mathcal {K}}_fP_r\). Then, it holds
By combining this lemma with the approximation estimate (2.6) and in view of \(\lambda (\alpha )-\lambda _r(\alpha ) \ge 0\) for all \(\alpha \in \{1,2,\ldots ,r\}\) according to the min-max theorem of Courant–Fischer, see [1], for example, we conclude that the truncation error of the singular value decomposition can be bounded by
Since the eigenvalues of the integral operator \({\mathcal {K}}_f\) and its adjoint \(\widetilde{{\mathcal {K}}}_f\) are the same, we can also exploit the smoothness of f in the second coordinate by interchanging the roles of \(\varOmega _1\) and \(\varOmega _2\) in the above considerations. We thus obtain the following theorem:
Theorem 1
Let \(f\in H^k(\varOmega _1\times \varOmega _2)\) for some fixed \(k\ge 0\), and let
Theorem 1 implies that the eigenvalues \(\{\lambda (\alpha )\}_{\alpha \in {\mathbb {N}}}\) in case of a function \(f\in H^k(\varOmega _1\times \varOmega _2)\) decay like
Having the decay rate of the eigenvalues at hand, we are able to improve the result of Lemma 2 by sacrificing some regularity. Note that the proof of this result is based upon an argument from [43].
Lemma 4
Assume that \(f\in H^{k+\min \{n_1,n_2\}}(\varOmega _1\times \varOmega _2)\) for some fixed \(k\ge 0\). Then, it holds
Without loss of generality, we assume \(n_1\!\le \! n_2\). Then, since \(f\!\in \! H^{k+n_1}(\varOmega _1\!\times \!\varOmega _2)\), we conclude from (2.8) that
where we used that \(n_1=\min \{n_1,n_2\}\). Moreover, by interpolating between \(L^2(\varOmega _1)\) and \(H^{k+n_1}(\varOmega _1)\), compare [33, 45], for example, we find
converges for almost all \({\varvec{x}}\in \varOmega _1\) and \({\varvec{y}}\in \varOmega _2\), provided that \(|\varvec{\beta }|\le k\). Because of Egorov’s theorem, the pointwise absolute convergence almost everywhere implies uniform convergence. Hence, we can switch differentiation and summation to get
Finally, we exploit the product structure of \(L^2(\varOmega _1\times \varOmega _2)\) and the orthonormality of \(\{\psi (\alpha )\}_{\alpha \in {\mathbb {N}}}\) to derive the first assertion, i.e.,
The second assertion follows in complete analogy. \(\square \)
2.4 Vector-Valued Functions
In addition to the aforementioned results, we will also need the following result which concerns the approximation of vector-valued functions. Here and in the sequel, the vector-valued function \({\varvec{w}} = [w(\alpha )]_{\alpha =1}^m\) is an element of \([L^2(\varOmega )]^m\) and \([H^k(\varOmega )]^m\) for some domain \(\varOmega \subset {\mathbb {R}}^n\), respectively, if the norms
since \({\varvec{w}}\) consists of m components and we thus need m-times as many ansatz functions for our approximation argument. Hence, in case of a vector-valued function \({\varvec{f}}\in [H_{{\text {mix}}}^{k,0}(\varOmega _1\times \varOmega _2)]^m\simeq [H^k(\varOmega _1)]^m \otimes L^2(\varOmega _2)\), we conclude by exploiting the smoothness in the first variable3 that the truncation error of the singular value decomposition can be estimated by
provided that \({\varvec{f}}\) has extra regularity in terms of \({\varvec{f}}\in [H^{k+n_1}(\varOmega _1\times \varOmega _2)]^m\). Here, analogously to above, \([H_{{\text {mix}}}^{0,k}(\varOmega _1\times \varOmega _2)]^m\simeq [L^2(\varOmega _1)]^m\otimes H^k(\varOmega _2)\).
After these preparations, we now introduce and analyze two types of continuous analogues of tensor formats, namely of the Tucker format [26, 49] and of the tensor train format [34, 36], and discuss their approximation properties for functions \(f\in H^k(\varOmega _1\times \dots \times \varOmega _m)\).
3 Tucker Tensor Format
3.1 Tucker Decomposition
We shall consider from now on a product domain which consists of m different domains \(\varOmega _j\subset {\mathbb {R}}^{n_j}\), \(j=1,\ldots ,m\). For given \(f\in L^2(\varOmega _1\times \dots \times \varOmega _m)\) and \(j\in \{1,2,\ldots ,m\}\), we apply the singular value decomposition to separate the variables \({\varvec{x}}_j\in \varOmega _j\) and \(({\varvec{x}}_1,\ldots , {\varvec{x}}_{j-1},{\varvec{x}}_{j+1},\ldots ,{\varvec{x}}_m)\in \varOmega _1 \times \dots \times \varOmega _{j-1}\times \varOmega _{j+1}\times \dots \times \varOmega _m\). We hence get
where the left eigenfunctions \(\{\varphi _j(\alpha _j)\}_{\alpha _j\in {\mathbb {N}}}\) form an orthonormal basis in \(L^2(\varOmega _j)\). Consequently, if we iterate over all \(j\in \{1,2,\ldots ,m\}\), this yields an orthonormal basis \(\{\varphi _1(\alpha _1)\otimes \cdots \otimes \varphi _m(\alpha _m)\}_{\varvec{\alpha }\in {\mathbb {N}}^m}\) of \(L^2(\varOmega _1\times \dots \times \varOmega _m)\), and we arrive at the representation
Herein, the tensor \(\big [\omega (\varvec{\alpha })\big ]_{\varvec{\alpha }\in {\mathbb {N}}^m}\) is the core tensor, where a single coefficient is given by
If we intend to truncate the singular value decomposition (3.1) after \(r_j\) terms such that the truncation error is bounded by \(\varepsilon \), we have to choose
We have the following result on the Tucker decomposition:
Theorem 2
Let \(f\in H^k(\varOmega _1\times \dots \times \varOmega _m)\) for some fixed \(k>0\) and \(0<\varepsilon <1\). If the ranks are chosen according to \(r_j=\varepsilon ^{-n_j/k}\) for all \(j=1,\dots ,m\), then the truncation error of the truncated Tucker decomposition is
while the storage cost for the core tensor of \(f_{r_1,\ldots ,r_m}^{{\text {TF}}}\) are \(\prod _{j=1}^m r_j = \varepsilon ^{-(n_1+\dots +n_m)/k}\).
Proof
For the approximation of the core tensor, the sets of the univariate eigenfunctions \(\{\varphi _j(\alpha _j)\}_{\alpha _j=1}^{r_j}\) are used for all \(j=1,\ldots ,m\), cf. (3.2). Due to orthonormality, we find
for all \(j\in \{1,2,\ldots ,m\}\), we arrive with (3.3) and the summation over \(j = 1,\ldots ,m\) at the desired error estimate. This completes the proof, since the estimate on the number of coefficients in the core tensor is obvious. \(\square \)
3.3 Sobolev Spaces with Dimension Weights
The cost of the core tensor of the Tucker decomposition exhibits the curse of dimension as the number m of subdomains increases. This can be seen most simply for the example \(n_j=n\). Then, the cost is \(\varepsilon ^{-nm/k}\), which expresses the curse of dimension as long as k is not proportional to m. Nonetheless, in case of Sobolev spaces with dimension weights, the curse of dimension can be beaten.
For \(f\in H^{k+n}(\varOmega ^m)\), we shall discuss the situation \(m\rightarrow \infty \) in more detail. To this end, we assume that all subdomains are identical to a single domain \(\varOmega \subset {\mathbb {R}}^n\) of dimension n and note that the limit \(m\rightarrow \infty \) only makes sense when weights are included in the underlying Sobolev spaces which ensure that higher dimensions become less important. For our proofs, we choose as usual m arbitrary but fixed and show the existence of m-independent constants in the convergence and cost estimates.
The Sobolev spaces \(H_{\varvec{\gamma }}^k(\varOmega ^m)\) with dimension weights \(\varvec{\gamma }\in {\mathbb {R}}^m\) which we consider are given by all functions \(f\in H^k(\varOmega ^m)\) such that
The definition in (3.4) means that, given a function f with norm \(\Vert f\Vert _{H^k(\varOmega ^m)}<\infty \), the partial derivatives with respect to \({\varvec{x}}_j\) become less important as the dimension j increases. Such functions appear, for example, in uncertainty quantification. Let be given a Karhunen–Loève expansion
and insert it into a function \(b:{\mathbb {R}}\rightarrow {\mathbb {R}}\) of finite smoothness \(W^{k,\infty }({\mathbb {R}})\). Then, the function \(b\big (u(\mathbf{x},\mathbf{y})\big )\) satisfies (3.4) with respect to the \(\mathbf{y}\)-variable, where \(\gamma _j = \sigma _j\). Hence, the solution of a given partial differential equation would satisfy a decay estimate similar to (3.4) whenever the stochastic field enters the partial differential equation through a non-smooth coefficient function b, compare [14, 23, 24], for example.
It turns out that algebraically decaying weights (3.5) are sufficient to beat the curse of dimension in case of the Tucker tensor decomposition.4
Theorem 3
Given \(\delta > 0\), let \(f\in H_{\varvec{\gamma }}^k(\varOmega ^m)\) for some fixed \(k>0\) with weights (3.4) that decay like
is of order \(\varepsilon \), while the storage cost for the core tensor of \(f_{r_1,\ldots ,r_m}^{{\text {TF}}}\) is bounded by \(\varepsilon ^{-n/k}\) independent of the dimension m.
Proof
In view of Theorem 1 and (3.4), we deduce by choosing the ranks as in (3.3) that
For the discussion of the continuous tensor train decomposition, we should assume that the domains \(\varOmega _j\subset {\mathbb {R}}^{n_j}\), \(j=1,\ldots ,m\), are arranged in such a way that it holds \(n_1\le \dots \le n_m\).5
Now, consider \(f\in H^k(\varOmega _1\times \dots \times \varOmega _m)\) and separate the variables \({\varvec{x}}_1\in \varOmega _1\) and \(({\varvec{x}}_2,\ldots ,{\varvec{x}}_m)\in \varOmega _2\times \dots \times \varOmega _m\) by the singular value decomposition
we can separate \((\alpha _1,{\varvec{x}}_2)\in {\mathbb {N}} \times \varOmega _2\) from \(({\varvec{x}}_3,\ldots ,{\varvec{x}}_m)\in \varOmega _3\times \dots \times \varOmega _m\) by means of a second singular value decomposition and arrive at
By repeating the last step and successively separating \((\alpha _{j-1},{\varvec{x}}_j)\in {\mathbb {N}}\times \varOmega _j\) from \(({\varvec{x}}_{j+1},\ldots ,{\varvec{x}}_m)\in \varOmega _{j+1} \times \dots \times \varOmega _m\) for \(j=3,\ldots ,m-1\), we finally arrive at the representation
In contrast to the Tucker format, we do not obtain a huge core tensor since each of the \(m-1\) singular value decompositions of the tensor train decomposition removes the actual first spatial domain from the approximant. We just obtain a product of matrix-valued functions (except for the first and last factor which are vector-valued functions), each of which is related to a specific domain \(\varOmega _j\). This especially results in only \(m-1\) sums in contrast to the m sums for the Tucker format.
4.2 Truncation Error
In practice, we truncate the singular value decomposition in step j after \(r_j\) terms, thus arriving at the representation
see also [35]. Note that, for \(j\ge 2\), the singular values \(\{\lambda _j(\alpha )\}_{\alpha \in {\mathbb {N}}}\) in this estimate do not coincide with the singular values from the original continuous tensor train decomposition due to the truncation.
We next shall give bounds on the truncation error. In the j-th step of the algorithm, \(j=2,3,\ldots ,m-1\), one needs to approximate the vector-valued function
by a vector-valued singular value decomposition. This means that we consider the singular value decomposition (2.9) for a vector-valued function in case of the domains \(\varOmega _j\) and \(\varOmega _{j+1}\times \dots \times \varOmega _m\).
For \(f\in H^{k+n_{m-1}}(\varOmega _1\times \dots \times \varOmega _m)\), it holds \({\varvec{g}}_2\in [H^{k+n_{m-1}}(\varOmega _2\times \dots \times \varOmega _m)]^{r_1}\) and
according to Lemma 4, precisely in its vectorized version (2.10). It follows \({\varvec{g}}_3 \in [H^{k+n_{m-1}}(\varOmega _3\times \dots \times \varOmega _m)]^{r_2}\) and, again by (2.10),
Estimate (4.2) shows that the \(H^k\)-seminorm of the vector-valued functions \({\varvec{g}}_j\) stays bounded by \(|f|_{H^k(\varOmega _1\times \dots \times \varOmega _m)}\). But according to (2.9), we have in the j-th step only the truncation error estimate
We summarize our findings in the following theorem, which holds in this form also if the subdomains \(\varOmega _j\subset {\mathbb {R}}_{n_j}\) are not ordered in such a way that \(n_1\le \dots \le n_m\).
Theorem 4
Let \(f\in H^{k+\max \{n_1,\ldots ,n_m\}}(\varOmega _1\times \dots \times \varOmega _m)\) for some fixed \(k>0\) and \(0<\varepsilon <1\). Then, the overall truncation error of the tensor train decomposition with truncation ranks (4.3) is
and hence is bounded by \({\mathcal {O}}(\varepsilon ^{-(2m-1) \max \{n_1,\ldots ,n_{m-1}\}/k})\).
Remark 2
If \(n:=n_1=\dots =n_m\), then the cost of the tensor train decomposition is \({\mathcal {O}}(\varepsilon ^{-(2m-1)n/k})\). Thus, the cost is quadratic compared to the cost of the Tucker decomposition. However, in practice, one performs m/2 forward steps and m/2 backward steps. This means one computes m/2 steps as described above to successively separate \({\varvec{x}}_1,{\varvec{x}}_2,\ldots ,{\varvec{x}}_{m/2}\) from the other variables. Then, one performs the algorithm in the opposite direction, i.e., one successively separates \({\varvec{x}}_{m},{\varvec{x}}_{m-1}, \ldots ,{\varvec{x}}_{m/2+1}\) from the other variables. This way, the overall cost is reduced to the order \({\mathcal {O}}(\varepsilon ^{-mn/k})\).6
4.3 Sobolev Spaces with Dimension Weights
Like for the Tucker decomposition, the cost of the tensor train decomposition suffers from the curse of dimension as the number m of subdomains increases. We therefore discuss again appropriate Sobolev spaces with dimension weights, where we assume for reasons of simplicity that all subdomains are identical to a single domain \(\varOmega \subset {\mathbb {R}}^n\) of dimension n.
Theorem 5
Given \(\delta > 0\), let \(f\in H_{\varvec{\gamma }}^{k+n}(\varOmega ^m)\) for some fixed \(k>0\) with weights (3.4) that decay like (3.5). For \(0<\varepsilon <1\), choose the ranks successively in accordance with
if \(j\le M\) and \(r_j = 0\) if \(j>M\). Here, M is given by
$$\begin{aligned} M = \varepsilon ^{-1/(1+\delta ')}. \end{aligned}$$
(4.6)
Then, the error of the continuous tensor train decomposition is of order \(\varepsilon \), while the storage cost of \(f_{r_1,\ldots ,r_m}^{{\text {TT}}}\) stays bounded by \(M\exp (\varepsilon ^{-n/k})^2\) independent of the dimension m.
Proof
The combination of Theorem 1, (3.4) and (4.5) implies
Hence, as in the proof of Theorem 3, the approximation error of the continuous tensor train decomposition is bounded by a multiple of \(\varepsilon \) independent of m.
and is hence bounded independently of m in view of (4.6). \(\square \)
5 Discussion and Conclusion
In the present article, we considered the continuous versions of the Tucker tensor format and of the tensor train format for the approximation of functions which live on an m-fold product of arbitrary subdomains. By considering (isotropic) Sobolev smoothness, we derived estimates on the ranks to be chosen in order to realize a prescribed target accuracy. These estimates exhibit the curse of dimension.
Both tensor formats have in common that always only the variable with respect to a single domain is separated from the other variables by means of the singular value decomposition. This enables cheaper storage schemes, while the influence of the overall dimension of the product domain is reduced to a minimum.
We also examined the situation of Sobolev spaces with dimension weights. Having sufficiently fast decaying weights helps to beat the curse of dimension as the number of subdomains tends to infinity. It turned out that algebraically decaying weights are appropriate for both, the Tucker tensor format and the tensor train format.
We finally remark that we considered here only the ranks of the tensor decomposition in the continuous case, i.e., for functions and not for tensors of discrete data. Of course, an additional projection step onto suitable finite-dimensional trial spaces on the individual domains would be necessary to arrive at a fully discrete approximation scheme that can really be used in computer simulations. This would impose a further error of discretization type which needs to be balanced with the truncation error of the particular continuous tensor format.
Acknowledgements
Michael Griebel was partially supported by the Sonderforschungsbereich 1060 The Mathematics of Emergent Effects funded by the Deutsche Forschungsgemeinschaft. Both authors like to thank Reinhold Schneider (Technische Universität Berlin) very much for fruitful discussion about tensor approximation.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Note that the kernel function of \({\mathcal {S}}_f^\star {\mathcal {S}}_f\) is matrix-valued, while the kernel function of \({\mathcal {S}}_f{\mathcal {S}}_f^\star \) is scalar-valued.
In Theorem 3, no truncation of the dimension is applied, as it would be required in practice if the number m of domains tends to infinity. Note that the dimension truncation is indeed here the same as for the tensor train decomposition later on, see also Theorem 5.
The considerations in this section are based upon [5]. Nonetheless, the results derived there are not correct. The authors did not consider the impact of the vector-valued singular value decomposition in a proper way, which indeed does result in the curse of dimension.
If the spatial dimensions \(n_j\), \(j=1,\ldots ,m\), of the subdomains are different, one can balance the number of forward and backward steps in a better way to reduce the cost further.