1.1 The Condition Number of Tensor Rank Decomposition
In this article, a
tensor is a multidimensional array filled with numbers:
The integer
d is called the
order of
. The tensor product of
d vectors
\(\mathbf {u}^{1} \in \mathbb {R}^{n_1},\ldots , \mathbf {u}^{d} \in \mathbb {R}^{n_d}\) is defined to be the tensor
\(\mathbf {u}^{1} \otimes \cdots \otimes \mathbf {u}^{d} \in \mathbb {R}^{n_1 \times \cdots \times n_d}\) with entries
$$\begin{aligned} (\mathbf {u}^{1} \otimes \cdots \otimes \mathbf {u}^{d})_{i_1,\ldots ,i_d} := u_{i_1}^{(1)} \cdots u_{i_d}^{(d)}, \text { where } \mathbf {u}^{j} = [u_i^{(j)}]_{1\le i\le n_j}. \end{aligned}$$
Any nonzero multidimensional array obeying this relation is called a
rank-1 tensor. Not every multidimensional array represents a rank-1 tensor, but every tensor
is a finite linear combination of rank-1 tensors:
Hitchcock [
50] coined the name
polyadic decomposition for the decomposition Eq. (
1). The smallest number
r for which
admits an expression as in Eq. (
1) is called the (real)
rank of
. A corresponding minimal decomposition is called a
canonical polyadic decomposition (CPD).
For instance, in algebraic statistics [
1,
59], chemical sciences [
67], machine learning [
4], psychometrics [
54], signal processing [
35,
36,
65], or theoretical computer science [
29], the input data have the structure of a tensor and the CPD of this tensor reveals the information of interest. Usually, this data is subject to measurement errors, which will cause the CPD computed from the measured data to differ from the CPD of the true data. In numerical analysis, the sensitivity of the model parameters, such as the rank-1 summands in the CPD, to perturbations of the data is often quantified by the
condition number [
61].
When there are multiple CPDs of a tensor
, the condition number must be defined at a decomposition
. However, in this article, we will restrict our analysis to tensors
having a
unique decomposition. Such tensors are called
identifiable. In this case, the condition number of the tensor rank decomposition of a tensor
is well-defined, and we denote it by
. We will explain in Section
1.3 in greater detail the notion of identifiability of tensors. At this point, the reader should mainly bear in mind that the assumption of being identifiable is comparably weak as most tensors of low rank satisfy it. However, note that matrices (
\(d=2\)) are never identifiable, so we assume that the order of the tensor is
\(d\ge 3\).
The condition number of tensor rank decomposition was characterized in [
20], and it is the condition number of the following computational problem: On input
of rank
r, compute the set of rank-1 terms
in the decomposition Eq. (
1). This condition number measures the sensitivity of the rank-1 terms with respect to perturbations of the tensor
. In other words, when the condition number
of the rank-
r identifiable tensor
in Eq. (
1) is finite, it is the smallest value
such that
holds for all rank-
r tensors
(with
of rank 1) sufficiently close to
. Herein, the norm on
\(\mathbb {R}^{n_1\times \cdots \times n_d}\) is the usual Euclidean norm, and
\(\mathfrak {S}_r\) is the permutation group on
\(\{1,\ldots ,r\}\). It was shown in [
24, Corollary 5.5] that the same expression holds if
is
any tensor close to
and
is the best rank-
r approximation of
in the Euclidean norm.
As a general principle in numerical analysis, the condition number is an intrinsic property of the computational problem that governs the forward error and attainable precision of any method for solving the problem. Its study is also useful for other purposes. For example, in [
21,
22] the local rate of convergence of Riemannian Gauss–Newton optimization methods for computing the CPD was related to the condition number
.
A conventional wisdom in numerical analysis is that it is harder to compute the condition number of a given problem instance than solving the problem itself [
38,
39]. This viewpoint led Smale to initiate the study of the probability distribution of condition numbers: If the condition number is small with high probability, then for many practical purposes one can assume that any given input is well-conditioned; at least the probability of failure necessarily will be small. Smale started studying the probability that a polynomial is ill-conditioned [
66]. This strategy was extended to linear algebra condition numbers [
26,
31,
41], to systems of polynomial equations in diverse settings [
42,
62], to linear systems of inequalities [
49], to linear and convex programming [
2,
68], eigenvalue and eigenvectors in the classic and other settings [
7], to polynomial eigenvalue problems [
6,
8], and to other computational models [
30], among others. As there is a substantive bibliography on this setting, we refer the reader to [
28] for further details.
Tensor rank decomposition seems to be no exception to this wisdom: The characterization of
for a given
in [
20] requires the CPD of
itself. This forces us to rely on probabilistic studies to establish reasonable
a priori values of the condition number. Settling this is the main purpose of this paper.
The first probabilistic analyses of the condition number of CPD were given in [
11,
23]. In those references, the expected value was computed for random rank-1 tensors; that is, for
random output of the computational problem of computing CPDs. This amounts to choosing random
\(\mathbf {u}_{i}^{k}\) in the notation above, constructing the corresponding tensor
and studying
. The probabilistic study is feasible, in principle, because one can obtain a closed expression for
which is polynomial in terms of the
\(\mathbf {u}_{i}^{k}\), so that the question boils down to an explicit but nontrivial integration problem.
This article is the first to investigate the condition number for
random input. That is, we assume that
is chosen at random within the set of rank-
r tensors (see the definition of random tensors in Definition
1 and the extension in Theorem
4) and we wonder about the expected value of
. The difficulty now is that even if we assume that a decomposition (
1) exists, we do not have it and hence we lack a closed expression for
.
One may wonder if these two different random procedures should give similar distributions in this or other numerical problems. The answer is
no. For example, say that our problem is to compute the kernel of a given matrix
\(A\in \mathbb {R}^{n\times (n+1)}\) and we want to study the expected value of the associated condition number
\(\Vert A\Vert \,\Vert A^\dagger \Vert \). Choosing
A at random produces
\(\mathbb {E}(\Vert A\Vert \,\Vert A^\dagger \Vert )<\infty \) but choosing the kernel at random and then
A at random within the matrices with that kernel is the same as computing the expected value of the usual Turing’s condition number of a square real Gaussian matrix, which is infinity; see [
31] for precise estimations of these quantities. The situation is similar in the study of systems of homogeneous polynomial equations: random inputs have better condition number than inputs produced from random outputs; see for example [
9]. In both these examples, the condition number of input constructed from random output is, on average, larger than the condition number of random input. This is a stroke of luck since in general one expects instances from practical, real-life problems, to be somehow random within the input space, not to have a random output!
In this paper, we show that computing the CPD is a
rara avis: we prove in Theorems
1 and
2 that (under suitable hypotheses) the condition number of random input tensors turns out to be infinity. On the contrary, by Beltrán et al. [
11] and Breiding and Vannieuwenhoven [
23] it is presumed that the average condition number is finite when choosing random output. This result reinforces the evidence that computing CPDs is a very challenging computational problem.
The literature often cites the result of Håstad [
53] to underline the high computational complexity of computing CPDs. Håstad showed that the NP-complete 3-satisfiability problem (also called 3-SAT) can be reduced to computing the rank of a tensor; hence, solving the tensor rank decomposition problem is NP-complete in the Turing machine computational model. Our main result is different in two aspects: First, Håstad showed the difficulty of only one particular instance of a CPD, whereas we show that computing the CPD is difficult
on average. Second, our evidence supporting the hardness of the problem is not based on Turing machine complexity, but given by analyzing the condition number, which is more appropriate for numerical computations [
16]. Linking complexity analyses to condition numbers is common in the literature; for instance, in the case of solving polynomial systems [
9,
27,
56,
63]. In general, the book [
28] provides a good overview. In this interpretation, we show that
computing CPDs numerically is hard on average.
On the other hand, in the literature, the main result of de Silva and Lim [
37] is often cited as a key reason why approximating a tensor by a low-rank CPD is such a challenging problem: for some input tensors, a
best low-rank approximation may not exist! This is because the set of tensors of bounded rank is not closed: There are tensors of rank strictly greater than
r that can be approximated arbitrarily well by rank-
r tensors. It is shown in [
37] that this
ill-posedness of the approximation problem is not rare in the sense that for every tensor space
\(\mathbb {R}^{n_1 \times n_2 \times n_3}\) there exists an open set of input tensors which do not admit a best rank-2 approximation. This result is stronger than Håstad’s in the sense that it proves that instances with no solution to the tensor rank approximation problem may occur on an open set, rather than in one particular set of measure zero. Notwithstanding this key result, it does not tell us about the complexity of solving the tensor rank
decomposition problem, in which we are given a rank-
r tensor whose CPD we seek. In this setting, there are no ill-posed inputs in the sense of de Silva and Lim [
37]. It was already shown in [
20] that the condition number diverges as one moves toward the open part of the boundary of tensors of bounded rank, entailing that there exist regions with arbitrarily high condition number. One of the main result of this paper, Theorem
2, shows that such regions cannot be ignored: They are sufficiently large to cause the integral of the condition number over the set of rank-
r tensors to diverge. In other words, one cannot neglect the regions where the condition number is so high that a CPD computed from a floating-point representation of a rank-
r tensor in
\(\mathbb {R}^{n_1 \times \cdots \times n_d}\), subject only to roundoff errors, is meaningless—a result similar in spirit to de Silva and Lim [
37].
One may conclude from the above that, at least from the point of view of average stability of the problem, tensor rank decomposition is doomed to fail. However, if one only cares about the
directions of the rank-1 terms in the decomposition, then the situation changes dramatically. The condition number associated with the computational problem “Given a rank-
r identifiable tensor
as in Eq. (
1), output the set of normalized rank-1 tensors
” will be called the
angular condition number
. Analogously to the bound Eq. (
2), one can show that when
\(\kappa _{\mathrm {ang}}\) is finite, it is the smallest number such that
for all rank-
r tensors
(with
rank-1 tensors) in a sufficiently small open neighborhood of
. By Breiding and Vannieuwenhoven [
24, Corollary 5.5], the same expression holds for
all tensors
in a small open neighborhood of
if
is the best rank-
r approximation of
.
We will prove in Theorem
3 that at least in the case of rank-2 tensors, the angular condition number
\(\kappa _{\mathrm {ang}}\) for random inputs is finite, contrary to the classic condition number
\(\kappa \); in fact, the numerical experiments in Sect.
7 suggest that this finite average condition seems to extend to much higher ranks as well. In other words, on average we may expect to be able to recover the
angular part of the CPD:
where
is as in Eq. (
1). One could conclude from this that a tensor decomposition algorithm should aim to produce the normalized rank-1 terms
from the tensor rank decomposition
accurately. Once these terms are obtained, one can recover the
\(\lambda _i\)’s by solving a linear system of equations. Since, as a general principle, the condition number of a composite smooth map
\(g \circ f\) between manifolds satisfies [
16,
28]
$$\begin{aligned} \kappa [g \circ f](x) := \Vert (\mathrm {d}{}_{f(x)} g) (\mathrm {d}{}_{x} f) \Vert \le \Vert \mathrm {d}{}_{f(x)} g \Vert \Vert \mathrm {d}{}_{x} f \Vert = \kappa [g](f(x)) \, \kappa [f](x), \end{aligned}$$
it follows that the condition number of tensor decomposition is bounded by the product of the condition numbers of the problem of finding the angular part of the CPD and the condition number of solving a linear least-squares problem. Our main results suggest that precisely the last problem will on average be ill-conditioned.
The foregoing observation can have major implications for algorithm design. Indeed, solving the tensor rank decomposition problem by first solving for the angular part and then the linear least-squares problem decomposes the problem into a nonlinear and a linear part. Crucially, the latter least-squares problem can be solved by
direct methods, such as a QR-factorization combined with a linear system solver. Such methods have a uniform computational cost regardless of the condition number of the problem. By contrast, since no (provably) numerically stable direct algorithms for tensor rank decomposition are currently known Beltrán et al. [
11], iterative methods are indispensable for this problem. We may expect their computational performance to depend on the condition number of the problem instance. Indeed, our main results combined with the main result of Breiding and Vannieuwenhoven [
21] imply, for example, that Riemannian Gauss–Newton optimization methods for solving the angular part of the CPD should, on average, require less iterations to reach convergence than Riemannian Gauss–Newton methods for solving the tensor decomposition problem directly (such as the methods in [
21,
22]), because the angular condition number
\(\kappa _{\mathrm {ang}}\) appears to be finite on average, while the regular condition number
\(\kappa \) is proved to be
\(\infty \) on average in most cases, as we show in this article.
Our main results also have consequences for researchers testing numerical algorithms for computing the CPD. In the literature, a common way of generating input data for testing algorithms is to sample the rank-1 terms
randomly, and then apply the algorithm to the associated tensor
. However, our analysis in this paper and the analyses in [
11,
23] show that this procedure generates tensors that are heavily biased toward being numerically well-conditioned. Hence, this way of testing algorithms probably does not correspond to a realistic distribution on the inputs. We acknowledge that it is currently not easy to sample rank-
r tensors uniformly even though some methods exist [
18]. In part, this is because equations for the algebraic variety containing the tensors of rank bounded by
r are hard to obtain [
57]. Nevertheless, in Sect.
7, using the observation from Remark
2, we present an acceptance–rejection method that can be applied to a few cases and yields uniformly distributed rank-
r tensors, relative to the Gaussian density in Definition
1. In any case we strongly advocate that the (range of) condition numbers are reported when testing the performance of iterative methods for solving the tensor rank decomposition problem, so that one can assess the difficulty of the problem instances. We believe it is always recommended to include models that are known to lead to instances with high condition numbers, such as those used in [
20,
22].
The formal presentation of our main results requires some extra notation that we introduce in subsequent sections.
1.3 Identifiable Tensors and a Formula for the Condition Number
A particular feature of higher-order tensors that distinguishes them from matrices is
identifiability. This means that in many cases the CPD of tensors of order
\(d\ge 3\) of small rank is unique. A tensor
is called
r-
identifiable if there is a unique set
of cardinality
r such that
and all
’s are rank-1 tensors. A celebrated criterion by Kruskal [
55] gives a tool to decide if a given tensor of order 3 satisfies this property.
Since matrix rank does not change with a field extension from
\(\mathbb {R}\) to
\(\mathbb {C}\), a real rank-
r tensor
that satisfies the assumptions of Lemma
1 is
r-identifiable over
\(\mathbb {R}\) and also automatically
r-identifiable over
\(\mathbb {C}\). In other words, Kruskal’s criterion is certifying
complex r-
identifiability of tensors, which is a strictly stronger notion than
r-identifiability over
\(\mathbb {R}\) [
5].
Most order 3 tensors of low-rank satisfy Kruskal’s criterion [
34]: There is an open dense subset of the set of rank-
r tensors in
\(\mathbb {R}^{n_1 \times n_2 \times n_3}\),
\(n_1 \ge n_2 \ge n_3 \ge 2\), where complex
r-identifiability holds, provided
\(r \le n_1 + \min \{ \tfrac{1}{2} \delta , \delta \}\) with
\(\delta := n_2 + n_3 - n_1 - 2\). In fact, this phenomenon occurs much more generally than third-order tensors of very small rank. Let us denote the set of complex tensors of complex rank bounded by
r by
This constructible
1 set turns out to be an open dense subset (in the Euclidean topology) of its Zariski closure
\(\overline{\sigma _{r;n_1,\ldots ,n_d}^\mathbb {C}}\); see [
57]. One says that
\(\sigma _{r;n_1,\ldots ,n_d}^\mathbb {C}\) is
generically complex r-
identifiable if the subset of points of
\(\sigma _{r;n_1,\ldots ,n_d}^\mathbb {C}\) that are not complex
r-identifiable is contained in a proper closed subset in the Zariski topology on the algebraic variety
\(\overline{\sigma _{r;n_1,\ldots ,n_d}^\mathbb {C}}\); see [
32]. It is known from dimensionality arguments [
32] that there is a maximum value of
r for which generic
r-identifiability of
\(\sigma _{r;n_1,\ldots ,n_d}\) can hold, namely
$$\begin{aligned} r \le r^{\text {crit}}_{n_1,\ldots ,n_d}, \quad \text { where }\quad r^{\text {crit}}_{n_1,\ldots ,n_d}:=\frac{n_1 \cdots n_d}{1 + \sum _{k=1}^d (n_k - 1)}. \end{aligned}$$
(3)
In fact, it is conjectured that the inequality is strict in general; see [
47] for details. For all other values of
r, generic
r-identifiability does not hold. In [
17,
32,
33,
40], it is proved that in the majority of choices for
\(n_1, \ldots , n_d\), generic complex
r-identifiability holds for most ranks with
\(r < r_\text {crit}\); see [
17, Theorem 7.2] for a result that is asymptotically optimal. For a summary of the conjecturally complete picture of complex
r-identifiability results, see [
34, Section 3].
The reason why we make this assumption is because it greatly simplifies some of the arguments. At the same time, Assumption
1 is (conjectured to be) extremely weak and only limits the generality in the exceptional cases listed in [
33, Theorem 1.1], and even then generic
r-identifiability only fails very close to the upper bound
\(r_\text {crit}\) of the permitted ranks.
An immediate benefit of Assumption
1 is that it allows for a nice expression of the condition number of the tensor rank decomposition problem. Let us denote the set of rank-1 tensors in
\(\mathbb {R}^{n_1\times \cdots \times n_d}\) by
$$\begin{aligned} \mathcal {S}_{n_1,\ldots ,n_d} = \{\mathbf {a}^{1} \otimes \cdots \otimes \mathbf {a}^{d} \mid \mathbf {a}^{k}\in \mathbb {R}^{n_k}\backslash \{0\}\}. \end{aligned}$$
It is a smooth manifold, called the
Segre manifold [
46,
57]. The set of tensors of rank bounded by
r is the image of the addition map:
\(\sigma _{r;n_1,\ldots ,n_d} = \varPhi (\mathcal {S}_{n_1,\ldots ,n_d}^{\times r})\), where
Then, under Assumption
1, there exists an open dense subset
\(\mathcal {N}_{r;n_1,\ldots ,n_d}\) of
\(\sigma _{r;n_1,\ldots ,n_d}\) such that for all
we have
by Beltrán et al. [
11, Proposition 4.5–4.7].
2 In particular, the points in the fiber are isolated, so there is a local inverse map
of
\(\varPhi \) for each
. Recall from [
20] that the condition number of the CPD at
is then the condition number (in the classic sense of Rice [
61]; see also [
28,
69]) of any of these local inverses:
where
is arbitrary; it is a corollary of Breiding and Vannieuwenhoven [
20, Theorem 1.1] that the above definition does not depend on the choice of
. Herein,
\(\Vert \cdot \Vert \) in the denominator is the Euclidean norm induced by the ambient
\(\mathbb {R}^{n_1\times \cdots \times n_d}\), and the norm in the numerator is the product norm of the Euclidean norms inherited from the ambient
\(\mathbb {R}^{n_1\times \cdots \times n_d}\)’s. The right-hand side
is the spectral norm of the derivative of
\(\varPhi _a^{-1}\) at
. See Sect.
2 for more details. By Breiding and Vannieuwenhoven [
20, Proposition 4.4], the condition number
does not depend on the norm of
:
for
\(t\in \mathbb {R}{\setminus }\{0\}\).
1.4 Main Results
The goal of this paper is to study the average condition number relative to “reasonable” density functions. By this, we mean probability distributions
\(\hat{\rho }\) that are
comparable to the standard Gaussian density
\(\rho \): There exist positive constants
\(c_1,c_2\) such that
\(c_1 \le \frac{{\hat{\rho }}}{\rho } \le c_2\). The main result, Theorem
4, applies, among others, for all distributions
\({\hat{\rho }}\) comparable to the following Gaussian density defined on the set of bounded rank tensors
\(\sigma _{r;n_1,\ldots ,n_d}\).
We first state our results for the foregoing Gaussian density. At the end of this subsection, in Theorem
4, we generalize these results to other densities, including all densities comparable to the Gaussian density. Our first contribution is the following result. We prove it in Sect.
3.
It should be mentioned that in our analysis we consider a small subset of
\(\sigma _{2;n_1,\ldots ,n_d}\) and show that on this subset the condition number integrates to infinity. In particular, a weak average-case analysis as proposed in [
3] would be of interest in this problem.
Under one additional assumption, we can extend the result from Theorem
1 to higher ranks. We prove the following theorem in Sect.
4.
By Bocci et al. [
17, Theorem 7.2], the assumptions of Theorem
2 are satisfied in a large number of cases. In fact, as the size of the tensor increases, the assumptions become weaker: When
\(n_1 \ge n_2 \ge \cdots \ge n_d \ge 2\), the conditions in Theorem
2 are satisfied for
\(r\le \min (s_1,s_2)\) with
$$\begin{aligned} s_1=&\frac{n_1 n_2 - (n_1 + n_2 + n_3 - 2)}{n_1 n_2} r_{n_1,\ldots ,n_d}^{\text {crit}}\\ s_2=&\frac{(n_1-2) (n_2-2) - (n_1 + n_2 + n_3 - 8)}{(n_1-2) (n_2-2)} r_{n_1-2,\ldots ,n_d-2}^{\text {crit}} + 2. \end{aligned}$$
Note that for large
\(n_i\), the second piece
\(s_2\) is the most restrictive. From Eq. (
3), it is implied that
\(r_{n_1-2,\ldots ,n_d-2}^\text {crit} = (1 - \delta _{n_1,\ldots ,n_d}){r_{n_1,\ldots ,n_d}^\text {crit}}\) with
\(\delta _{n_1,\ldots ,n_d} = \mathcal {O}( \sum _{k=1}^d \frac{1}{n_k} )\). Therefore, we obtain the following asymptotically optimal result.
It follows from dimensionality arguments that if
\(r>r_\text {crit}\), then the addition map
\(\varPhi \) does not have a local inverse. In fact, in this case all of the connected components in the fiber of
\(\varPhi \) at
have positive dimension [
46]. It follows from Breiding and Vannieuwenhoven [
20] that the condition number of the tensor rank decomposition problem at each expression Eq. (
1) of length
r of such a tensor
is
\(\infty \). In this case,
, regardless of how the tensor decomposition problem is defined
3 when
has multiple distinct decompositions; see also the discussion in [
28, Remark 14.14]. In this case the average condition number is infinite, as well.
Our results lead us to the conjecture that the expected condition number is infinite, also without making the assumption from Theorem
2 and without any upper bound on the rank.
Corollary
1 proves this conjecture asymptotically, in practice leaving only a small range of ranks for which it might fail.
As mentioned above, it turns out that for GITs the expected
angular condition number is not always infinite. Formally, the angular condition number is defined as follows: Let the canonical projection onto the sphere be
\(p: \mathbb {R}^{n_1\times \cdots \times n_d} \rightarrow \mathbb {S}(\mathbb {R}^{n_1\times \cdots \times n_d})\). Then, the angular condition number of
is
where
is an arbitrary local inverse of
\(\varPhi \) with
. As before we do not specify what happens on the measure-zero set
\(\sigma _{r;n_1,\ldots ,n_d}{\setminus }\mathcal {N}_{r;n_1,\ldots ,n_d}\), because it is not relevant for this paper. The angular condition number only accounts for the angular part of the CPD, i.e., the directions of the tensors, not for their magnitude, hence the name.
To distinguish the condition numbers Eqs. (
5) and (
6), we will refer to the condition number from Eq.
5 as the
regular condition number. Oftentimes we even drop the clarification “regular.”
Here is the result for
for tensors of rank two that we prove in Sect.
5.
Unfortunately, we do not know if this theorem can be extended to higher rank tensors. However, based on our experiments in Sect.
7, we pose the following:
We finally observe that the foregoing main results are not limited to GITs. They are valid for a wide range of distributions of random tensors.