Skip to main content
Top
Published in: Foundations of Computational Mathematics 2/2023

Open Access 08-02-2022

The Average Condition Number of Most Tensor Rank Decomposition Problems is Infinite

Authors: Carlos Beltrán, Paul Breiding, Nick Vannieuwenhoven

Published in: Foundations of Computational Mathematics | Issue 2/2023

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The tensor rank decomposition, or canonical polyadic decomposition, is the decomposition of a tensor into a sum of rank-1 tensors. The condition number of the tensor rank decomposition measures the sensitivity of the rank-1 summands with respect to structured perturbations. Those are perturbations preserving the rank of the tensor that is decomposed. On the other hand, the angular condition number measures the perturbations of the rank-1 summands up to scaling. We show for random rank-2 tensors that the expected value of the condition number is infinite for a wide range of choices of the density. Under a mild additional assumption, we show that the same is true for most higher ranks \(r\ge 3\) as well. In fact, as the dimensions of the tensor tend to infinity, asymptotically all ranks are covered by our analysis. On the contrary, we show that rank-2 tensors have finite expected angular condition number. Based on numerical experiments, we conjecture that this could also be true for higher ranks. Our results underline the high computational complexity of computing tensor rank decompositions. We discuss consequences of our results for algorithm design and for testing algorithms computing tensor rank decompositions.
Notes
Communicated by Joseph M. Landsberg.
CB: Supported by Spanish “Ministerio de Economía y Competitividad” Under Project PID2020-113887GB-I00, as well as by the Banco Santander and Universidad de Cantabria Under Project 21.SI01.64658. PB: Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation)—Projektnummer 445466444. NV: Supported by the Postdoctoral Fellowship of the Research Foundation—Flanders (FWO) with Project Numbers 12E8116N and 12E8119N and partially supported by KU Leuven project STG/19/002.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

1.1 The Condition Number of Tensor Rank Decomposition

In this article, a tensor is a multidimensional array filled with numbers:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ160_HTML.png
The integer d is called the order of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq2_HTML.gif . The tensor product of d vectors \(\mathbf {u}^{1} \in \mathbb {R}^{n_1},\ldots , \mathbf {u}^{d} \in \mathbb {R}^{n_d}\) is defined to be the tensor \(\mathbf {u}^{1} \otimes \cdots \otimes \mathbf {u}^{d} \in \mathbb {R}^{n_1 \times \cdots \times n_d}\) with entries
$$\begin{aligned} (\mathbf {u}^{1} \otimes \cdots \otimes \mathbf {u}^{d})_{i_1,\ldots ,i_d} := u_{i_1}^{(1)} \cdots u_{i_d}^{(d)}, \text { where } \mathbf {u}^{j} = [u_i^{(j)}]_{1\le i\le n_j}. \end{aligned}$$
Any nonzero multidimensional array obeying this relation is called a rank-1 tensor. Not every multidimensional array represents a rank-1 tensor, but every tensor  https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq5_HTML.gif is a finite linear combination of rank-1 tensors:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ1_HTML.png
(1)
Hitchcock [50] coined the name polyadic decomposition for the decomposition Eq. (1). The smallest number r for which https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq6_HTML.gif admits an expression as in Eq. (1) is called the (real) rank of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq7_HTML.gif . A corresponding minimal decomposition is called a canonical polyadic decomposition (CPD).
For instance, in algebraic statistics [1, 59], chemical sciences [67], machine learning [4], psychometrics [54], signal processing [35, 36, 65], or theoretical computer science [29], the input data have the structure of a tensor and the CPD of this tensor reveals the information of interest. Usually, this data is subject to measurement errors, which will cause the CPD computed from the measured data to differ from the CPD of the true data. In numerical analysis, the sensitivity of the model parameters, such as the rank-1 summands in the CPD, to perturbations of the data is often quantified by the condition number [61].
When there are multiple CPDs of a tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq8_HTML.gif , the condition number must be defined at a decomposition https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq9_HTML.gif . However, in this article, we will restrict our analysis to tensors https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq10_HTML.gif having a unique decomposition. Such tensors are called identifiable. In this case, the condition number of the tensor rank decomposition of a tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq11_HTML.gif is well-defined, and we denote it by  https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq12_HTML.gif . We will explain in Section 1.3 in greater detail the notion of identifiability of tensors. At this point, the reader should mainly bear in mind that the assumption of being identifiable is comparably weak as most tensors of low rank satisfy it. However, note that matrices (\(d=2\)) are never identifiable, so we assume that the order of the tensor is \(d\ge 3\).
The condition number of tensor rank decomposition was characterized in [20], and it is the condition number of the following computational problem: On input https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq15_HTML.gif of rank r, compute the set of rank-1 terms https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq16_HTML.gif in the decomposition Eq. (1). This condition number measures the sensitivity of the rank-1 terms with respect to perturbations of the tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq17_HTML.gif . In other words, when the condition number https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq18_HTML.gif of the rank-r identifiable tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq19_HTML.gif in Eq. (1) is finite, it is the smallest value https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq20_HTML.gif such that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ2_HTML.png
(2)
holds for all rank-r tensors https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq21_HTML.gif (with https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq22_HTML.gif of rank 1) sufficiently close to https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq23_HTML.gif . Herein, the norm on \(\mathbb {R}^{n_1\times \cdots \times n_d}\) is the usual Euclidean norm, and \(\mathfrak {S}_r\) is the permutation group on \(\{1,\ldots ,r\}\). It was shown in [24, Corollary 5.5] that the same expression holds if https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq27_HTML.gif is any tensor close to https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq28_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq29_HTML.gif is the best rank-r approximation of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq30_HTML.gif in the Euclidean norm.
As a general principle in numerical analysis, the condition number is an intrinsic property of the computational problem that governs the forward error and attainable precision of any method for solving the problem. Its study is also useful for other purposes. For example, in [21, 22] the local rate of convergence of Riemannian Gauss–Newton optimization methods for computing the CPD was related to the condition number https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq31_HTML.gif .
A conventional wisdom in numerical analysis is that it is harder to compute the condition number of a given problem instance than solving the problem itself [38, 39]. This viewpoint led Smale to initiate the study of the probability distribution of condition numbers: If the condition number is small with high probability, then for many practical purposes one can assume that any given input is well-conditioned; at least the probability of failure necessarily will be small. Smale started studying the probability that a polynomial is ill-conditioned [66]. This strategy was extended to linear algebra condition numbers [26, 31, 41], to systems of polynomial equations in diverse settings [42, 62], to linear systems of inequalities [49], to linear and convex programming [2, 68], eigenvalue and eigenvectors in the classic and other settings [7], to polynomial eigenvalue problems [6, 8], and to other computational models [30], among others. As there is a substantive bibliography on this setting, we refer the reader to [28] for further details.
Tensor rank decomposition seems to be no exception to this wisdom: The characterization of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq32_HTML.gif for a given https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq33_HTML.gif in [20] requires the CPD of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq34_HTML.gif itself. This forces us to rely on probabilistic studies to establish reasonable a priori values of the condition number. Settling this is the main purpose of this paper.

1.2 Informal Version of Our Main Results and Discussion

The first probabilistic analyses of the condition number of CPD were given in [11, 23]. In those references, the expected value was computed for random rank-1 tensors; that is, for random output of the computational problem of computing CPDs. This amounts to choosing random \(\mathbf {u}_{i}^{k}\) in the notation above, constructing the corresponding tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq36_HTML.gif and studying https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq37_HTML.gif . The probabilistic study is feasible, in principle, because one can obtain a closed expression for https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq38_HTML.gif which is polynomial in terms of the \(\mathbf {u}_{i}^{k}\), so that the question boils down to an explicit but nontrivial integration problem.
This article is the first to investigate the condition number for random input. That is, we assume that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq40_HTML.gif is chosen at random within the set of rank-r tensors (see the definition of random tensors in Definition 1 and the extension in Theorem 4) and we wonder about the expected value of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq41_HTML.gif . The difficulty now is that even if we assume that a decomposition (1) exists, we do not have it and hence we lack a closed expression for https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq42_HTML.gif .
One may wonder if these two different random procedures should give similar distributions in this or other numerical problems. The answer is no. For example, say that our problem is to compute the kernel of a given matrix \(A\in \mathbb {R}^{n\times (n+1)}\) and we want to study the expected value of the associated condition number \(\Vert A\Vert \,\Vert A^\dagger \Vert \). Choosing A at random produces \(\mathbb {E}(\Vert A\Vert \,\Vert A^\dagger \Vert )<\infty \) but choosing the kernel at random and then A at random within the matrices with that kernel is the same as computing the expected value of the usual Turing’s condition number of a square real Gaussian matrix, which is infinity; see [31] for precise estimations of these quantities. The situation is similar in the study of systems of homogeneous polynomial equations: random inputs have better condition number than inputs produced from random outputs; see for example [9]. In both these examples, the condition number of input constructed from random output is, on average, larger than the condition number of random input. This is a stroke of luck since in general one expects instances from practical, real-life problems, to be somehow random within the input space, not to have a random output!
In this paper, we show that computing the CPD is a rara avis: we prove in Theorems 1 and 2 that (under suitable hypotheses) the condition number of random input tensors turns out to be infinity. On the contrary, by Beltrán et al. [11] and Breiding and Vannieuwenhoven [23] it is presumed that the average condition number is finite when choosing random output. This result reinforces the evidence that computing CPDs is a very challenging computational problem.
The literature often cites the result of Håstad [53] to underline the high computational complexity of computing CPDs. Håstad showed that the NP-complete 3-satisfiability problem (also called 3-SAT) can be reduced to computing the rank of a tensor; hence, solving the tensor rank decomposition problem is NP-complete in the Turing machine computational model. Our main result is different in two aspects: First, Håstad showed the difficulty of only one particular instance of a CPD, whereas we show that computing the CPD is difficult on average. Second, our evidence supporting the hardness of the problem is not based on Turing machine complexity, but given by analyzing the condition number, which is more appropriate for numerical computations [16]. Linking complexity analyses to condition numbers is common in the literature; for instance, in the case of solving polynomial systems [9, 27, 56, 63]. In general, the book [28] provides a good overview. In this interpretation, we show that computing CPDs numerically is hard on average.
On the other hand, in the literature, the main result of de Silva and Lim [37] is often cited as a key reason why approximating a tensor by a low-rank CPD is such a challenging problem: for some input tensors, a best low-rank approximation may not exist! This is because the set of tensors of bounded rank is not closed: There are tensors of rank strictly greater than r that can be approximated arbitrarily well by rank-r tensors. It is shown in [37] that this ill-posedness of the approximation problem is not rare in the sense that for every tensor space \(\mathbb {R}^{n_1 \times n_2 \times n_3}\) there exists an open set of input tensors which do not admit a best rank-2 approximation. This result is stronger than Håstad’s in the sense that it proves that instances with no solution to the tensor rank approximation problem may occur on an open set, rather than in one particular set of measure zero. Notwithstanding this key result, it does not tell us about the complexity of solving the tensor rank decomposition problem, in which we are given a rank-r tensor whose CPD we seek. In this setting, there are no ill-posed inputs in the sense of de Silva and Lim [37]. It was already shown in [20] that the condition number diverges as one moves toward the open part of the boundary of tensors of bounded rank, entailing that there exist regions with arbitrarily high condition number. One of the main result of this paper, Theorem 2, shows that such regions cannot be ignored: They are sufficiently large to cause the integral of the condition number over the set of rank-r tensors to diverge. In other words, one cannot neglect the regions where the condition number is so high that a CPD computed from a floating-point representation of a rank-r tensor in \(\mathbb {R}^{n_1 \times \cdots \times n_d}\), subject only to roundoff errors, is meaningless—a result similar in spirit to de Silva and Lim [37].
One may conclude from the above that, at least from the point of view of average stability of the problem, tensor rank decomposition is doomed to fail. However, if one only cares about the directions of the rank-1 terms in the decomposition, then the situation changes dramatically. The condition number associated with the computational problem “Given a rank-r identifiable tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq48_HTML.gif as in Eq. (1), output the set of normalized rank-1 tensors https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq49_HTML.gif ” will be called the angular condition number https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq50_HTML.gif . Analogously to the bound Eq. (2), one can show that when \(\kappa _{\mathrm {ang}}\) is finite, it is the smallest number such that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ161_HTML.png
for all rank-r tensors https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq52_HTML.gif (with https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq53_HTML.gif rank-1 tensors) in a sufficiently small open neighborhood of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq54_HTML.gif . By Breiding and Vannieuwenhoven [24, Corollary 5.5], the same expression holds for all tensors https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq55_HTML.gif in a small open neighborhood of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq56_HTML.gif if https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq57_HTML.gif is the best rank-r approximation of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq58_HTML.gif .
We will prove in Theorem 3 that at least in the case of rank-2 tensors, the angular condition number \(\kappa _{\mathrm {ang}}\) for random inputs is finite, contrary to the classic condition number \(\kappa \); in fact, the numerical experiments in Sect. 7 suggest that this finite average condition seems to extend to much higher ranks as well. In other words, on average we may expect to be able to recover the angular part of the CPD:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ162_HTML.png
where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq61_HTML.gif is as in Eq. (1). One could conclude from this that a tensor decomposition algorithm should aim to produce the normalized rank-1 terms https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq62_HTML.gif from the tensor rank decomposition
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ163_HTML.png
accurately. Once these terms are obtained, one can recover the \(\lambda _i\)’s by solving a linear system of equations. Since, as a general principle, the condition number of a composite smooth map \(g \circ f\) between manifolds satisfies [16, 28]
$$\begin{aligned} \kappa [g \circ f](x) := \Vert (\mathrm {d}{}_{f(x)} g) (\mathrm {d}{}_{x} f) \Vert \le \Vert \mathrm {d}{}_{f(x)} g \Vert \Vert \mathrm {d}{}_{x} f \Vert = \kappa [g](f(x)) \, \kappa [f](x), \end{aligned}$$
it follows that the condition number of tensor decomposition is bounded by the product of the condition numbers of the problem of finding the angular part of the CPD and the condition number of solving a linear least-squares problem. Our main results suggest that precisely the last problem will on average be ill-conditioned.
The foregoing observation can have major implications for algorithm design. Indeed, solving the tensor rank decomposition problem by first solving for the angular part and then the linear least-squares problem decomposes the problem into a nonlinear and a linear part. Crucially, the latter least-squares problem can be solved by direct methods, such as a QR-factorization combined with a linear system solver. Such methods have a uniform computational cost regardless of the condition number of the problem. By contrast, since no (provably) numerically stable direct algorithms for tensor rank decomposition are currently known Beltrán et al. [11], iterative methods are indispensable for this problem. We may expect their computational performance to depend on the condition number of the problem instance. Indeed, our main results combined with the main result of Breiding and Vannieuwenhoven [21] imply, for example, that Riemannian Gauss–Newton optimization methods for solving the angular part of the CPD should, on average, require less iterations to reach convergence than Riemannian Gauss–Newton methods for solving the tensor decomposition problem directly (such as the methods in [21, 22]), because the angular condition number \(\kappa _{\mathrm {ang}}\) appears to be finite on average, while the regular condition number \(\kappa \) is proved to be \(\infty \) on average in most cases, as we show in this article.
Our main results also have consequences for researchers testing numerical algorithms for computing the CPD. In the literature, a common way of generating input data for testing algorithms is to sample the rank-1 terms https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq68_HTML.gif randomly, and then apply the algorithm to the associated tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq69_HTML.gif . However, our analysis in this paper and the analyses in [11, 23] show that this procedure generates tensors that are heavily biased toward being numerically well-conditioned. Hence, this way of testing algorithms probably does not correspond to a realistic distribution on the inputs. We acknowledge that it is currently not easy to sample rank-r tensors uniformly even though some methods exist [18]. In part, this is because equations for the algebraic variety containing the tensors of rank bounded by r are hard to obtain [57]. Nevertheless, in Sect. 7, using the observation from Remark 2, we present an acceptance–rejection method that can be applied to a few cases and yields uniformly distributed rank-r tensors, relative to the Gaussian density in Definition 1. In any case we strongly advocate that the (range of) condition numbers are reported when testing the performance of iterative methods for solving the tensor rank decomposition problem, so that one can assess the difficulty of the problem instances. We believe it is always recommended to include models that are known to lead to instances with high condition numbers, such as those used in [20, 22].
The formal presentation of our main results requires some extra notation that we introduce in subsequent sections.

1.3 Identifiable Tensors and a Formula for the Condition Number

A particular feature of higher-order tensors that distinguishes them from matrices is identifiability. This means that in many cases the CPD of tensors of order \(d\ge 3\) of small rank is unique. A tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq71_HTML.gif is called r-identifiable if there is a unique set https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq72_HTML.gif of cardinality r such that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq73_HTML.gif and all https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq74_HTML.gif ’s are rank-1 tensors. A celebrated criterion by Kruskal [55] gives a tool to decide if a given tensor of order 3 satisfies this property.
Lemma 1
(Kruskal’s criterion [55, 64]) Let \(\mathbb {F}\) be \(\mathbb {R}\) or \(\mathbb {C}\), https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq78_HTML.gif a tensor of order 3 and assume that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq79_HTML.gif where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq80_HTML.gif Define the factor matrices \(U_\ell = [\mathbf {u}_{i}^{\ell }]_{1\le i \le r} \in \mathbb {F}^{n_\ell \times r}\) for \(\ell =1,2,3\), and let \(k_\ell \) be the largest integer k such that every subset of k columns of \(U_\ell \) has rank equal to k. If \( r \le \frac{1}{2}( k_1 + k_2 + k_3 - 2) \) and \(k_1, k_2, k_3 > 1\), then the tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq87_HTML.gif is r-identifiable over \(\mathbb {F}\).
Since matrix rank does not change with a field extension from \(\mathbb {R}\) to \(\mathbb {C}\), a real rank-r tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq91_HTML.gif that satisfies the assumptions of Lemma 1 is r-identifiable over \(\mathbb {R}\) and also automatically r-identifiable over \(\mathbb {C}\). In other words, Kruskal’s criterion is certifying complex r-identifiability of tensors, which is a strictly stronger notion than r-identifiability over \(\mathbb {R}\) [5].
Most order 3 tensors of low-rank satisfy Kruskal’s criterion [34]: There is an open dense subset of the set of rank-r tensors in \(\mathbb {R}^{n_1 \times n_2 \times n_3}\), \(n_1 \ge n_2 \ge n_3 \ge 2\), where complex r-identifiability holds, provided \(r \le n_1 + \min \{ \tfrac{1}{2} \delta , \delta \}\) with \(\delta := n_2 + n_3 - n_1 - 2\). In fact, this phenomenon occurs much more generally than third-order tensors of very small rank. Let us denote the set of complex tensors of complex rank bounded by r by
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ164_HTML.png
This constructible1 set turns out to be an open dense subset (in the Euclidean topology) of its Zariski closure \(\overline{\sigma _{r;n_1,\ldots ,n_d}^\mathbb {C}}\); see [57]. One says that \(\sigma _{r;n_1,\ldots ,n_d}^\mathbb {C}\) is generically complex r-identifiable if the subset of points of \(\sigma _{r;n_1,\ldots ,n_d}^\mathbb {C}\) that are not complex r-identifiable is contained in a proper closed subset in the Zariski topology on the algebraic variety \(\overline{\sigma _{r;n_1,\ldots ,n_d}^\mathbb {C}}\); see [32]. It is known from dimensionality arguments [32] that there is a maximum value of r for which generic r-identifiability of \(\sigma _{r;n_1,\ldots ,n_d}\) can hold, namely
$$\begin{aligned} r \le r^{\text {crit}}_{n_1,\ldots ,n_d}, \quad \text { where }\quad r^{\text {crit}}_{n_1,\ldots ,n_d}:=\frac{n_1 \cdots n_d}{1 + \sum _{k=1}^d (n_k - 1)}. \end{aligned}$$
(3)
In fact, it is conjectured that the inequality is strict in general; see [47] for details. For all other values of r, generic r-identifiability does not hold. In [17, 32, 33, 40], it is proved that in the majority of choices for \(n_1, \ldots , n_d\), generic complex r-identifiability holds for most ranks with \(r < r_\text {crit}\); see [17, Theorem 7.2] for a result that is asymptotically optimal. For a summary of the conjecturally complete picture of complex r-identifiability results, see [34, Section 3].
Assumption 1
In the rest of this article, we will assume that \(\sigma _{r;n_1,\ldots ,n_r}^\mathbb {C}\) is generically complex r-identifiable.
The reason why we make this assumption is because it greatly simplifies some of the arguments. At the same time, Assumption 1 is (conjectured to be) extremely weak and only limits the generality in the exceptional cases listed in [33, Theorem 1.1], and even then generic r-identifiability only fails very close to the upper bound \(r_\text {crit}\) of the permitted ranks.
An immediate benefit of Assumption 1 is that it allows for a nice expression of the condition number of the tensor rank decomposition problem. Let us denote the set of rank-1 tensors in \(\mathbb {R}^{n_1\times \cdots \times n_d}\) by
$$\begin{aligned} \mathcal {S}_{n_1,\ldots ,n_d} = \{\mathbf {a}^{1} \otimes \cdots \otimes \mathbf {a}^{d} \mid \mathbf {a}^{k}\in \mathbb {R}^{n_k}\backslash \{0\}\}. \end{aligned}$$
It is a smooth manifold, called the Segre manifold [46, 57]. The set of tensors of rank bounded by r is the image of the addition map: \(\sigma _{r;n_1,\ldots ,n_d} = \varPhi (\mathcal {S}_{n_1,\ldots ,n_d}^{\times r})\), where
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ4_HTML.png
(4)
Then, under Assumption 1, there exists an open dense subset \(\mathcal {N}_{r;n_1,\ldots ,n_d}\) of \(\sigma _{r;n_1,\ldots ,n_d}\) such that for all https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq115_HTML.gif we have https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq116_HTML.gif by Beltrán et al. [11, Proposition 4.5–4.7].2 In particular, the points in the fiber are isolated, so there is a local inverse map https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq118_HTML.gif of \(\varPhi \) for each https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq120_HTML.gif . Recall from [20] that the condition number of the CPD at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq121_HTML.gif is then the condition number (in the classic sense of Rice [61]; see also [28, 69]) of any of these local inverses:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ5_HTML.png
(5)
where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq122_HTML.gif is arbitrary; it is a corollary of Breiding and Vannieuwenhoven [20, Theorem 1.1] that the above definition does not depend on the choice of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq123_HTML.gif . Herein, \(\Vert \cdot \Vert \) in the denominator is the Euclidean norm induced by the ambient \(\mathbb {R}^{n_1\times \cdots \times n_d}\), and the norm in the numerator is the product norm of the Euclidean norms inherited from the ambient \(\mathbb {R}^{n_1\times \cdots \times n_d}\)’s. The right-hand side https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq127_HTML.gif is the spectral norm of the derivative of \(\varPhi _a^{-1}\) at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq129_HTML.gif . See Sect. 2 for more details. By Breiding and Vannieuwenhoven [20, Proposition 4.4], the condition number https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq130_HTML.gif does not depend on the norm of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq131_HTML.gif : https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq132_HTML.gif for \(t\in \mathbb {R}{\setminus }\{0\}\).
Remark 1
We did not specify the value of the condition number for https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq134_HTML.gif . The main reason is that our analysis is independent of the values that the condition number takes on this set of measure zero, so that for simplicity we decided against including the more complicated general case where there can be several distinct elements in the preimage.

1.4 Main Results

The goal of this paper is to study the average condition number relative to “reasonable” density functions. By this, we mean probability distributions \(\hat{\rho }\) that are comparable to the standard Gaussian density \(\rho \): There exist positive constants \(c_1,c_2\) such that \(c_1 \le \frac{{\hat{\rho }}}{\rho } \le c_2\). The main result, Theorem 4, applies, among others, for all distributions \({\hat{\rho }}\) comparable to the following Gaussian density defined on the set of bounded rank tensors \(\sigma _{r;n_1,\ldots ,n_d}\).
Definition 1
(Gaussian identifiable tensors) We define a random variable https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq141_HTML.gif on \(\sigma _{r;n_1,\ldots ,n_d}\) by specifying its density as
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ165_HTML.png
is the normalization constant. Under Assumption 1, if https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq143_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq144_HTML.gif , we say that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq145_HTML.gif is a Gaussian Identifiable Tensor (GIT) of rank r.
Remark 2
Suppose that r is a typical rank of tensors in \({\mathbb {R}}^{n_1\times \cdots \times n_d}\). This means that \(\sigma _{r;n_1,\ldots ,n_d}\) contains a Euclidean open subset of \({\mathbb {R}}^{n_1\times \cdots \times n_d}\) and is of maximum dimension \(n_1\cdots n_d\). Then, the distribution defined in Definition 1 is a conditional probability distribution: A GIT https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq150_HTML.gif of rank r has the distribution https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq151_HTML.gif , where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq152_HTML.gif is a tensor with independent and identically distributed (i.i.d.) standard Gaussian entries. We exploit this fact in our numerical experiments to sample GITs using an acceptance–rejection method.
We first state our results for the foregoing Gaussian density. At the end of this subsection, in Theorem 4, we generalize these results to other densities, including all densities comparable to the Gaussian density. Our first contribution is the following result. We prove it in Sect. 3.
Theorem 1
Let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq153_HTML.gif be a GIT of rank \(r= 2\). Then, https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq155_HTML.gif
It should be mentioned that in our analysis we consider a small subset of \(\sigma _{2;n_1,\ldots ,n_d}\) and show that on this subset the condition number integrates to infinity. In particular, a weak average-case analysis as proposed in [3] would be of interest in this problem.
Under one additional assumption, we can extend the result from Theorem 1 to higher ranks. We prove the following theorem in Sect. 4.
Theorem 2
Let \(n_1,\ldots ,n_d\ge 3\). On top of Assumption 1, we assume that \(\sigma _{r-2,n_1-2,\ldots ,n_r-2}\) is generically complex identifiable. Then, for a GIT https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq159_HTML.gif , \(r \ge 3\), we have https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq161_HTML.gif
By Bocci et al. [17, Theorem 7.2], the assumptions of Theorem 2 are satisfied in a large number of cases. In fact, as the size of the tensor increases, the assumptions become weaker: When \(n_1 \ge n_2 \ge \cdots \ge n_d \ge 2\), the conditions in Theorem 2 are satisfied for \(r\le \min (s_1,s_2)\) with
$$\begin{aligned} s_1=&\frac{n_1 n_2 - (n_1 + n_2 + n_3 - 2)}{n_1 n_2} r_{n_1,\ldots ,n_d}^{\text {crit}}\\ s_2=&\frac{(n_1-2) (n_2-2) - (n_1 + n_2 + n_3 - 8)}{(n_1-2) (n_2-2)} r_{n_1-2,\ldots ,n_d-2}^{\text {crit}} + 2. \end{aligned}$$
Note that for large \(n_i\), the second piece \(s_2\) is the most restrictive. From Eq. (3), it is implied that \(r_{n_1-2,\ldots ,n_d-2}^\text {crit} = (1 - \delta _{n_1,\ldots ,n_d}){r_{n_1,\ldots ,n_d}^\text {crit}}\) with \(\delta _{n_1,\ldots ,n_d} = \mathcal {O}( \sum _{k=1}^d \frac{1}{n_k} )\). Therefore, we obtain the following asymptotically optimal result.
Corollary 1
Let \(d \ge 3\) be fixed, and \(n_1 \ge n_2 \ge \cdots \ge n_d \ge 2\). If \(n_1,\ldots ,n_d \rightarrow \infty \), then for a GIT https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq171_HTML.gif we have https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq172_HTML.gif for all
$$\begin{aligned} 2 \le r < (1 - \epsilon _{n_1,\ldots ,n_d}) \, r_{n_1,\ldots ,n_d}^{\text {crit}}, \end{aligned}$$
where \( \lim _{n_1,\ldots ,n_d \rightarrow \infty } \epsilon _{n_1,\ldots ,n_d} \rightarrow 0. \)
It follows from dimensionality arguments that if \(r>r_\text {crit}\), then the addition map \(\varPhi \) does not have a local inverse. In fact, in this case all of the connected components in the fiber of \(\varPhi \) at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq177_HTML.gif have positive dimension [46]. It follows from Breiding and Vannieuwenhoven [20] that the condition number of the tensor rank decomposition problem at each expression Eq. (1) of length r of such a tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq178_HTML.gif is \(\infty \). In this case, https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq180_HTML.gif , regardless of how the tensor decomposition problem is defined3 when https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq181_HTML.gif has multiple distinct decompositions; see also the discussion in [28, Remark 14.14]. In this case the average condition number is infinite, as well.
Our results lead us to the conjecture that the expected condition number is infinite, also without making the assumption from Theorem 2 and without any upper bound on the rank.
Conjecture 1
Let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq182_HTML.gif be a GIT of rank \(r\ge 2\). Then, https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq184_HTML.gif
Corollary 1 proves this conjecture asymptotically, in practice leaving only a small range of ranks for which it might fail.
As mentioned above, it turns out that for GITs the expected angular condition number is not always infinite. Formally, the angular condition number is defined as follows: Let the canonical projection onto the sphere be \(p: \mathbb {R}^{n_1\times \cdots \times n_d} \rightarrow \mathbb {S}(\mathbb {R}^{n_1\times \cdots \times n_d})\). Then, the angular condition number of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq186_HTML.gif is
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ6_HTML.png
(6)
where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq187_HTML.gif is an arbitrary local inverse of \(\varPhi \) with https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq189_HTML.gif . As before we do not specify what happens on the measure-zero set \(\sigma _{r;n_1,\ldots ,n_d}{\setminus }\mathcal {N}_{r;n_1,\ldots ,n_d}\), because it is not relevant for this paper. The angular condition number only accounts for the angular part of the CPD, i.e., the directions of the tensors, not for their magnitude, hence the name.
To distinguish the condition numbers Eqs. (5) and (6), we will refer to the condition number from Eq. 5 as the regular condition number. Oftentimes we even drop the clarification “regular.”
Here is the result for https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq191_HTML.gif for tensors of rank two that we prove in Sect. 5.
Theorem 3
Let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq192_HTML.gif be a GIT of rank 2. Then, https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq193_HTML.gif .
Unfortunately, we do not know if this theorem can be extended to higher rank tensors. However, based on our experiments in Sect. 7, we pose the following:
Conjecture 2
Let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq194_HTML.gif be a GIT of rank r. Then, https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq195_HTML.gif .
We finally observe that the foregoing main results are not limited to GITs. They are valid for a wide range of distributions of random tensors.
Theorem 4
Theorems 1, 2, Corollary 1 and Theorem 3 are still true if instead of GITs we take random tensors defined by a wide range of other probability distributions, including some of interest such as:
1.
All probability distributions that are comparable to the standard Gaussian density \(\rho \). This means that the random tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq197_HTML.gif has a density \(\hat{\rho }\) for which there exists positive constants \(c_1,c_2\) such that \(c_1 \le \frac{{\hat{\rho }}}{\rho } \le c_2\).
 
2.
Uniformly randomly chosen https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq201_HTML.gif in the unit sphere \({\mathbb {S}}(\sigma _r)\).
 
3.
Uniformly randomly chosen https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq203_HTML.gif in the unit ball https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq204_HTML.gif .
 

1.5 Organization of the Article

The rest of the article is organized as follows. In the next section, we give some preliminary material. Thereafter, in Sects. 36, we successively prove Theorems 14. In Sect. 7, we present numerical experiments supporting our main results. Finally, in Appendices AC we give proofs for several lemmata that we need in the other sections.

2 Notation and Preliminaries

2.1 Notation

We will use the following typographic conventions for convenience: Vectors are typeset in a bold face (\(\mathbf {a}, \mathbf {b}\)), matrices in upper case (A, B), tensors in a calligraphic font ( https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq206_HTML.gif , https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq207_HTML.gif ), and manifolds and linear spaces in a different calligraphic font (\(\mathcal {A}, \mathcal {B}\)).
The positive integer \(d \ge 2\) is reserved for the order of a tensor, \(n_1,\ldots ,n_d \ge 2\) are its dimensions, and \(r \ge 1\) is its rank. The following integers are used throughout the paper:
$$\begin{aligned} \varSigma := 1 + \sum _{k=1}^d (n_k - 1) \quad \text {and}\quad \varPi := \prod _{k=1}^d n_k; \end{aligned}$$
they correspond to the dimension of the Segre manifold \(\mathcal {S}_{n_1,\ldots ,n_d}\) and the dimension of the ambient space \(\mathbb {R}^{n_1 \times \cdots \times n_d}\), respectively. The symmetric group on r elements is denoted by \(\mathfrak {S}_r\).
We work exclusively with real vector spaces, for which \(\langle \cdot ,\cdot \rangle \) denotes the Euclidean inner product and \(\Vert \cdot \Vert \) always denotes the associated norm. We will switch freely between the finite-dimensional vector spaces \(\mathbb {R}^{n_1 \cdots n_d}\) and \(\mathbb {R}^{n_1 \times \cdots \times n_d}\) for representing tensors in the abstract vector space \(\mathbb {R}^{n_1} \otimes \cdots \otimes \mathbb {R}^{n_d}\). By the above choice of norms, all of these finite-dimensional Hilbert spaces are isometric; specifically, if https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq220_HTML.gif and \(\mathbf {a} \in \mathbb {R}^{n_1 \cdots n_d}\) is its coordinate array with respect to an orthogonal basis, then https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq222_HTML.gif . Similarly, if the coordinates \(\mathbf {a}\) are reshaped into a multidimensional array \(A \in \mathbb {R}^{n_1 \times \cdots \times n_d}\), then https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq225_HTML.gif . It is important to note that this notation can conflict with the usual meaning of \(\Vert A\Vert \) when \(d=2\); to distinguish the spectral norm from the standard norm in this paper, we write \(\Vert A\Vert _2\) for the former; see Eq. (7).
For matrices \(U_1 \in \mathbb {R}^{m_1\times n_1}, \ldots , U_d \in \mathbb {R}^{m_d\times n_d}\), the tensor product \(U_1\otimes \cdots \otimes U_d\) acts on rank-1 tensors as follows:
$$\begin{aligned} (U_1\otimes \cdots \otimes U_d)(\mathbf {u}^1\otimes \cdots \otimes \mathbf {u}^d) = (U_1\mathbf {u}^1)\otimes \cdots \otimes (U_d\mathbf {u}^d). \end{aligned}$$
By the universal property [44], this extends to a linear map \(\mathbb {R}^{n_1} \otimes \cdots \otimes \mathbb {R}^{n_d}\rightarrow \mathbb {R}^{m_1} \otimes \cdots \otimes \mathbb {R}^{m_d}\). Note that we can view \(U_1\otimes \cdots \otimes U_d \) as a matrix in \(\mathbb {R}^{(m_1\cdots m_d)\times (n_1\cdots n_d)}\).
For any subset \(U \subset V\) of a normed vector space V, we define the sphere over U as
$$\begin{aligned} \mathbb {S}(U) := \left\{ \frac{\mathbf {u}}{\Vert \mathbf {u}\Vert } \mid \mathbf {u} \in U{\setminus }\{0\} \right\} \subset V. \end{aligned}$$
In particular, the unit sphere in \(\mathbb {R}^n\) is denoted by \(\mathbb {S}(\mathbb {R}^n)\).
Given an \(m \times n\) matrix R or a linear operator \(R:\mathbb {R}^n\rightarrow \mathbb {R}^m\), we denote the pseudo-inverse by \(R^\dagger \). The spectral norm and smallest singular value of R are denoted respectively by
$$\begin{aligned} \Vert R\Vert _2 := \max _{\mathbf {v}\in \mathbb {R}^n}\frac{\Vert R \mathbf {v}\Vert }{\Vert \mathbf {v}\Vert } \quad \text { and }\quad \varsigma _{\min }(R) := \min _{\mathbf {v}\in \mathbb {R}^n}\frac{\Vert R \mathbf {v}\Vert }{\Vert \mathbf {v}\Vert }. \end{aligned}$$
(7)
A special role will be played in this paper by the product of all but the smallest singular values of R, which we denote by q(R). In other words, if R is injective, then
$$\begin{aligned} q(R) := {\varsigma _1(R) \cdots \varsigma _{n-1}(R)} = \frac{\sqrt{\det (R^TR)}}{\varsigma _{\min }(R)}, \end{aligned}$$
(8)
where \(R^T\) is the transposed matrix (operator) and \(\varsigma _i(R)\) is the ith largest singular value of R.

2.2 Differential Geometry

In this article, we only consider submanifolds of Euclidean spaces; see, e.g., [58] for the general definitions. A smooth (\(C^\infty \)) manifold is a topological manifold with a smooth structure, in the sense of Lee [58]. The tangent space \(\mathrm {T}_{x} {\, \mathcal {M}}\) at x to an embedded n-dimensional smooth submanifold \(\mathcal {M} \subset \mathbb {R}^N\) is the set
$$\begin{aligned} \left\{ \mathbf {v} \in \mathbb {R}^N \;|\; \exists \text { a smooth curve } \gamma (t) \subset \mathcal {M} \text { with } \gamma (0)=x: \mathbf {v} = \frac{\mathrm {d}{}}{\mathrm {d}{}t}\Big |_{t=0} \, \gamma (t) \right\} . \end{aligned}$$
At every point \(x \in \mathcal {M}\), there exist open neighborhoods \(\mathcal {V} \subset \mathcal {M}\) and \(\mathcal {U} \subset \mathrm {T}_{x} {\, \mathcal {M}}\) of x, and a bijective smooth map \(\phi : \mathcal {V} \rightarrow \mathcal {U}\) with smooth inverse. The tuple \((\mathcal {V},\phi )\) is a coordinate chart of \(\mathcal {M}\). A smooth map between manifolds \(F : \mathcal {M} \rightarrow \mathcal {N}\) is a map such that for every \(x \in \mathcal {M}\) and coordinate chart \((\mathcal {V},\phi )\) containing x, and every coordinate chart \((\mathcal {W}, \psi )\) containing F(x), we have that \(\psi \circ F \circ \phi ^{-1} : \phi (\mathcal {U}) \rightarrow \psi (F(\mathcal {U}))\) is a smooth map. The derivative of F can be defined as the linear map \(\mathrm {d}{}_{x} F : \mathrm {T}_{x} {\, \mathcal {M}} \rightarrow \mathrm {T}_{F(x)} {\, \mathcal {N}}\) taking the tangent vector \(\mathbf {v} \in \mathrm {T}_{x} {\, \mathcal {M}}\) to \(\frac{\mathrm {d}{}}{\mathrm {d}{}t}|_{t=0} F(\gamma (t)) \in \mathrm {T}_{F(x)} {\, \mathcal {N}}\) where \(\gamma (t) \subset \mathcal {M}\) is a curve with \(\gamma (0) = x\) and \(\gamma '(0) = \mathbf {v}\). If \(\dim \mathcal {M} = \dim \mathcal {N}\) and if \(\mathrm {d}{}_{x} F\) has full rank, there is a neighborhood \(\mathcal {W} \subset \mathcal {M}\) on which F is invertible and its inverse is also smooth; that is, F is a diffeomorphism between \(\mathcal {W}\) and \(F(\mathcal {W})\). If this property holds for all \(x \in \mathcal {M}\), then F is called a local diffeomorphism.
A differentiable submanifold \(\mathcal {M}\subset \mathbb {R}^N\) can be equipped with a Riemannian metric g, turning it into a Riemannian manifold, allowing for the computation of integrals. The manifolds in this paper are all embedded submanifolds of Euclidean space, so the Riemannian metric for us will always be the metric inherited from the ambient space.

2.3 The Manifold of r-Nice Tensors

As in the introduction, the Segre manifold is
$$\begin{aligned} {\mathcal {S}}_{n_1,\ldots ,n_d} = \{\mathbf {u} ^1\otimes \cdots \otimes \mathbf {u}^d \mid \mathbf {u}^k\in \mathbb {R}^{n_k}\backslash \{0\}\}. \end{aligned}$$
It is a smooth manifold of dimension \(\varSigma \). Its tangent space is given by
$$\begin{aligned} \mathrm {T}_{\mathbf {u}^1\otimes \cdots \otimes \mathbf {u}^d} {\, \mathcal {S}_{n_1,\ldots ,n_d}} = \mathbb {R}^{n_1} \otimes \mathbf {u}^2 \otimes \cdots \otimes \mathbf {u}^d + \cdots +\mathbf {u}^1\otimes \cdots \otimes \mathbf {u}^{d-1}\otimes \mathbb {R}^{n_d}; \end{aligned}$$
(9)
note that this is not a direct sum.
The Euclidean inner product between rank-1 tensors is conveniently computed by the following formula (see, e.g., [45]):
$$\begin{aligned} \langle \mathbf {u}^1\otimes \cdots \otimes \mathbf {u}^d,\mathbf {v}^1\otimes \cdots \otimes \mathbf {v}^d\rangle = \prod _{i=1}^d \langle \mathbf {u}^i, \mathbf {v}^i\rangle . \end{aligned}$$
(10)
The set of tensors of rank at most r is denoted by
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ166_HTML.png
it is a semialgebraic set of dimension at most \(\min \{r\varSigma , \varPi \}\); see, e.g., [60]. Under Assumption 1, the dimension of \(\sigma _{r;n_1,\ldots ,n_d}\) is exactly \(r\varSigma \).
In [11, Section 4], we introduced an open dense subset of \(\sigma _{r;n_1,\ldots ,n_d}\) with favorable differential-geometric properties. We called it the manifold of r-nice tensors in [11, Definition 4.2]. Below, we present a slightly modified definition that is suitable for our present purpose; it eliminates conditions (4) and (5) from Beltrán et al. [11, Definition 4.2].
In what follows, we denote the real closure in the Zariski topology of a subset \(A \subset \mathbb {R}^\varPi \) by \(\overline{A}\). This is the real algebraic variety \(\overline{A}:=\overline{A}^{{\mathbb {C}}} \cap \mathbb {R}^\varPi \), where \(\overline{A}^{{\mathbb {C}}}\) is the closure of A in the Zariski topology in \(\mathbb {C}^\varPi \). By Whitney [70, Lemma 8], the real dimension of \(\overline{A}\) equals the complex dimension of \(\overline{A}^{{\mathbb {C}}}\).
Definition 2
Recall the addition map \(\varPhi \) defined in Eq. (4). Let \(\mathcal {M}_{r;n_1,\ldots ,n_d} \subset (\mathcal {S}_{n_1,\ldots ,n_d})^{\times r}\) be the subset of r-tuples https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq283_HTML.gif of rank-1 tensors satisfying all of the following properties:
1.
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq284_HTML.gif is a smooth point of the algebraic variety \(\overline{\sigma _{r;n_1,\ldots ,n_d}}\);
 
2.
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq286_HTML.gif is complex r-identifiable; and
 
3.
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq287_HTML.gif .
 
The set of r-nice tensors is \(\mathcal {N}_{r;n_1,\ldots ,n_d}:=\varPhi (\mathcal {M}_{r;n_1,\ldots ,n_d})\).
Remark that the third item in the definition is well defined because of the second item.
Proposition 1
If Assumption 1 holds, then the following statements are true:
1.
\(\mathcal {M}_{r;n_1,\ldots ,n_d}\) and \(\mathcal {N}_{r;n_1,\ldots ,n_d}\) are smooth manifolds of dimension \(r\varSigma \);
 
2.
\(\mathcal {M}_{r;n_1,\ldots ,n_d}\) is Zariski-open in \((\mathcal {S}_{n_1,\ldots ,n_d})^{\times r}\);
 
3.
\(\mathcal {N}_{r;n_1,\ldots ,n_d}\) is Zariski-open in \(\sigma _{r;n_1,\ldots ,n_d}\);
 
4.
the addition map \(\varPhi {\mid _{\mathcal {M}_{r;n_1,\ldots ,n_d}}}\) is a global diffeomorphism onto its image;
 
5.
\(\mathcal {N}_{r;n_1,\ldots ,n_d}\) is closed under multiplication by nonzero scalars; and
 
6.
\(\mathcal {M}_{r;n_1,\ldots ,n_d} \subset \mathbb {R}^{n_1 \times \cdots \times n_d} \times \cdots \times \mathbb {R}^{n_1 \times \cdots \times n_d}\) and \(\mathcal {N}_{r;n_1,\ldots ,n_d} \subset \mathbb {R}^{n_1 \times \cdots \times n_d}\) are embedded submanifolds.
 
Proof
Items 1, 2, 3, and 6 are proved as follows. Let \(X_1\) and \(X_2\) be, respectively, the set of tensors in \(\sigma _{r;n_1,\ldots ,n_d}\) which are not complex r-identifiable and which are not smooth points of \(\overline{\sigma _{r;n_1,\ldots ,n_d}}\). Both are Zariski-closed in \(\overline{\sigma _{r;n_1,\ldots ,n_d}}\) under Assumption 1, and hence so are the preimages \(\varPhi ^{-1}(X_1)\) and \(\varPhi ^{-1}(X_2)\). Moreover, the third defining condition of \(\mathcal {M}_{r;n_1,\ldots ,n_d}\) is also Zariski-closed in \((\mathcal {S}_{n_1,\ldots ,n_d})^{\times r}\) from the explicit formula for the condition number Eq. (12). Hence, \(\mathcal {M}_{r;n_1,\ldots ,n_d}\) is Zariski-open. An open subset of an embedded submanifold is itself an embedded submanifold so the claim for \(\mathcal {M}_{r;n_1,\ldots ,n_d}\) is proved. Moreover, the dimension of the complement of \(\mathcal {M}_{r;n_1,\ldots ,n_d}\) is at most \(r\varSigma -1\) and so its image by the rational map \(\varPhi \) is contained in an algebraic set of dimension at most \(r\varSigma -1\), thus proving that \(\mathcal {N}_{r;n_1,\ldots ,n_d}\) is also Zariski-open and indeed an embedded submanifold of the set of smooth points of \(\overline{\sigma _{r;n_1,\ldots ,n_d}}\), which is itself an embedded submanifold of its affine ambient space, see [12, Proposition 3.2.9].
The fourth item is due to the definition of the condition number, the fact that it is finite on \(\mathcal {N}_{r;n_1,\ldots ,n_d}\) by Definition 2, and the injectivity of \(\varPhi |_{\mathcal {M}_{r;n_1,\ldots ,n_d}}\) by Definition 2 (2).
The fifth item follows by noting that the three defining properties of \(\mathcal {N}_{r;n_1,\ldots ,n_d}\) are all true independent of a nonzero scaling. \(\square \)
Remark 3
The definition of r-nice tensors in [11, Definition 4.2] involves two more requirements, but those are not needed here.
Since the tangent space of \(\mathcal {N}_{r;n_1,\ldots ,n_d}\) at a point is the image of the derivative of the local diffeomorphism \(\varPhi \), we have the following characterization:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ11_HTML.png
(11)

2.4 Sensitivity of CPDs

The condition number of the problem of computing the rank-1 terms of a CPD of a tensor was studied in a general setting in [20]; the following characterization of the condition number is Theorem 1.1 of [20]. Let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq323_HTML.gif , where the https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq324_HTML.gif are rank-1 tensors. For each i let \(U_i\) be a matrix whose columns form an orthonormal basis of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq326_HTML.gif . Then,
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ12_HTML.png
(12)
The matrix \(U=[U_1, \ldots , U_r]\in \mathbb {R}^{\varPi \times r\varSigma }\) is also called a Terracini matrix. An explicit expression for the \(U_i\)’s is given in [20, equation (5.1)].
Since https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq329_HTML.gif uniquely depends on https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq330_HTML.gif , we can view the condition number of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq331_HTML.gif as a function of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq332_HTML.gif :
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ13_HTML.png
(13)
where the matrices \(U_i\) are as before. The benefit of Eq. (13) is that it is well-defined for any tuple https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq334_HTML.gif (and not just those mapping into \(\mathcal {N}_{r,n_1,\ldots ,n_d}\)).

2.5 Integrals

For fixed \(t\in (0,1]\) and a point \(\mathbf {y} \in \mathbb {S}(\mathbb {R}^n)\), the spherical cap of radius t around \(\mathbf {y}\) is defined as \(\mathrm {cap}(\mathbf {y},t) := \{\mathbf {x} \in \mathbb {S}(\mathbb {R}^n): {\langle \mathbf {x}, \mathbf {y}\rangle \; >\sqrt{ 1-t^2}}\}\). Its volume satisfies
$$\begin{aligned} c_1{(n)}t^{n-1}\le \mathrm {vol}(\mathrm {cap}(\mathbf {y},t))\le c_2{(n)}t^{n-1} \end{aligned}$$
(14)
for some positive constants \(0<c_1{(n)}<c_2{(n)}\).
The following general lemma will be useful later.
Lemma 2
Let \(u,v>0\) be fixed. Then, \(0<\int _0^\infty t^u \,e^{-\frac{(t + v)^2}{2}}\,\mathrm {d}{} t<\infty .\)
Proof
It is clear that the integral is not zero. Furthermore, since \((t + v)^2 > t^2+v^2\) for \(t,v>0\), we see that \(\int _0^\infty t^u \,e^{-\frac{(t + v)^2}{2}}\,\mathrm {d}{} t \le \int _0^\infty t^u \,e^{-\frac{t^2 + v^2}{2}}\,\mathrm {d}{} t = e^{-\frac{v^2}{2}}\sqrt{2}^{u-1}\varGamma (\frac{u+1}{2})\), which is finite. \(\square \)

2.6 The Coarea Formula

Let \(\mathcal {M}\) and \(\mathcal {N}\) be submanifolds of \(\mathbb {R}^n\) of equal dimension, and let \(F: \mathcal {M} \rightarrow \mathcal {N}\) be a smooth surjective map. A point \(y\in \mathcal {N}\) is called a regular value of F if for all points \(x\in F^{-1}(y)\) the differential \(\mathrm {d}{}_{x} F\) is of full rank. The preimage \(F^{-1}(y)\) of a regular value y is a discrete set of points. Let \(|F^{-1}(y)|\) be the number of elements in this preimage. Then, the coarea formula [52] states that for every integrable function g we have
$$\begin{aligned} \int _{\mathcal {N}} |F^{-1}(y)| \, g(y) \,\mathrm {d}{}y = \int _{\mathcal {M}} \mathrm {Jac}(F)(x) \, g(F(x)) \,\mathrm {d}{}x, \end{aligned}$$
(15)
where \(\mathrm {Jac}(F)(x):=\vert \det \mathrm {d}{}_{x} F\vert \) is the Jacobian determinant of F at x. Note that almost all \(y\in \mathcal {N}\) are regular values of F by Sard’s theorem [58, Theorem 6.10]. Hence, integrating over \(\mathcal {N}\) is the same as integrating over all regular values of F.
Remark 4
In [52], the coarea formula is given in the more general case when \(\dim \mathcal {M} \ge \dim \mathcal {N}\). In this article, we only need the case when the dimension of \(\dim \mathcal {M}\) and \(\dim \mathcal {N}\) coincide. Moreover, if F is injective, then Eq. (15) reduces to the well-known change-of-variables formula.

3 The Average Condition Number of Gaussian Tensors of Rank Two

The goal of this section is to prove Theorem 1. We will proceed in three steps. First, the 2-nice tensors are conveniently parameterized via elementary manifolds such as one-dimensional intervals and spheres in Sect. 3.1. Second, the Jacobian determinant of this map is computed in Sect. 3.2. Third, the integral can be bounded from below with the help of a few technical auxiliary lemmas in Sect. 3.3. In the next section, we will exploit Theorem 1 for generalizing the argument to most higher ranks. To simplify notation, in this section we let
$$\begin{aligned} \mathcal {S}:= \mathcal {S}_{n_1,\ldots ,n_d},\;\sigma _2:=\sigma _{2,n_1,\ldots ,n_d},\;\mathcal {N}_2:=\mathcal {N}_{2,n_1,\ldots ,n_d} \;\text {and}\;\mathcal {M}_2:=\mathcal {M}_{2,n_1,\ldots ,n_d}. \end{aligned}$$

3.1 Parameterizing 2-Nice Tensors

Let
$$\begin{aligned} \mathcal {P}:=\mathbb {S}{}(\mathbb {R}^{n_1}) \times \cdots \times \mathbb {S}{}(\mathbb {R}^{n_d}) \end{aligned}$$
(16)
and consider the next parameterization of the Segre manifold:
$$\begin{aligned} \psi : (0, \infty ) \times \mathcal {P} \rightarrow \mathcal {S},\; (\lambda , \mathbf {u}^1,\ldots ,\mathbf {u}^d) \mapsto \lambda \cdot \mathbf {u}^1 \otimes \cdots \otimes \mathbf {u}^d. \end{aligned}$$
(17)
The preimage of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq362_HTML.gif has cardinality https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq363_HTML.gif . By composing \(\varPsi := \psi \times \psi \) with the addition map from Eq. (4), we get the following alternative representation of tensors of rank bounded by 2:
$$\begin{aligned} (\varPhi \circ \varPsi )((\lambda , \mathbf {u}^1,\ldots ,\mathbf {u}^d), (\mu , \mathbf {v}^1,\ldots ,\mathbf {v}^d)) = \lambda \cdot \mathbf {u}^1 \otimes \cdots \otimes \mathbf {u}^d + \mu \cdot \mathbf {v}^1 \otimes \cdots \otimes \mathbf {v}^d. \end{aligned}$$
We would like to apply the coarea formula Eq. (15) to pull back the integral of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq365_HTML.gif over \(\sigma _{2}\) via the parameterization \(\varPhi \circ \varPsi \). However, \(\sigma _{2}\) in general is not a manifold, so the formula does not apply. Nevertheless, we can use the manifold \(\mathcal {N}_{2}\) of 2-nice tensors instead. By Proposition 1(3), \(\mathcal {N}_2\) is Zariski open in \(\sigma _2\), so that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ167_HTML.png
where \(C_2 := C_{2;n_1,\ldots ,n_d}\) is as in Definition 1. By applying the coarea formula Eq. (15) to the smooth map \(\varPhi \mid _{\mathcal {M}_2}\), we get
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ168_HTML.png
where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq374_HTML.gif is the Jacobian determinant of \(\varPhi \) at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq376_HTML.gif . In the first equality, we used https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq377_HTML.gif for 2-identifiable tensors; indeed, we have that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq378_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq379_HTML.gif because https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq380_HTML.gif has rank equal to 2.
In the following, we switch to the notation from Eq. (13): https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq381_HTML.gif . Since \(\mathcal {M}_{2}\) is also Zariski open in \(\mathcal {S} \times \mathcal {S}\) by Proposition 1(2), we may replace the integral over \({\mathcal {M}}_{2}\) by an integral over \(\mathcal {S} \times \mathcal {S}\), thus obtaining
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ169_HTML.png
We use the coarea formula again, but this time for \(\varPsi = \psi \times \psi \), where \(\psi \) is the parameterization from Eq. (17). Note that for https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq388_HTML.gif we have https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq389_HTML.gif . We get
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ18_HTML.png
(18)
where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq390_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq391_HTML.gif are both tuples in \((0,\infty ) \times \mathcal {P}\). Next, we compute the Jacobian determinant https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq393_HTML.gif .

3.2 Computing the Jacobian Determinant

Note that the dimension of the domain of \(\varPhi \circ \varPsi \) is equal to \(2\varSigma \). As above, let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq396_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq397_HTML.gif be tuples in \((0,\infty ) \times \mathcal {P}\) with \(\mathcal {P}\) as in Eq. (16). In the following, we write
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ170_HTML.png
The Jacobian determinant of \(\varPhi \circ \varPsi \) at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq401_HTML.gif is, by definition, the absolute value of the determinant of the linear map
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ171_HTML.png
Consider the matrix of partial derivatives of \(\varPhi \circ \varPsi \) with respect to the standard orthonormal basis of \(\mathbb {R}^{n_1\times \cdots \times n_d}\):
$$\begin{aligned} Q := \begin{bmatrix} L&M \end{bmatrix}\in \mathbb {R}^{\varPi \times 2\varSigma }, \end{aligned}$$
(19)
where
$$\begin{aligned} L := \begin{bmatrix} \frac{\partial (\varPhi \circ \varPsi )}{\partial \mathbf {u}^1}&\ldots&\frac{\partial (\varPhi \circ \varPsi )}{\partial \mathbf {u}^d}&\frac{\partial (\varPhi \circ \varPsi )}{\partial \mathbf {v}^1}&\ldots&\frac{\partial (\varPhi \circ \varPsi )}{\partial \mathbf {v}^d}\end{bmatrix} \quad \text {and}\quad M:= \begin{bmatrix} \frac{\partial (\varPhi \circ \varPsi )}{\partial \lambda }&\frac{\partial (\varPhi \circ \varPsi )}{\partial \mu } \end{bmatrix}. \end{aligned}$$
(20)
Then, the Jacobian determinant of \(\varPhi \circ \varPsi \) at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq405_HTML.gif is
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ21_HTML.png
(21)
The latter is the volume of the parallelepiped spanned by the columns of Q. We fix notation in the next definition.
Definition 3
Let \(N{\,\ge \,}n\) be positive integers and \(U\in {\mathbb {R}}^{N\times n}\) be a matrix with columns \(\mathbf {u}_1,\ldots ,\mathbf {u}_n\in {\mathbb {R}}^N\). We denote by \(\mathrm {vol}(U)\) the volume of the parallelepiped spanned by the \(\mathbf {u}_i\):
$$\begin{aligned} \mathrm {vol}(U) := \sqrt{\det (U^TU)}. \end{aligned}$$
We can now rewrite Eq. (21) as
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ172_HTML.png
The reason why we write the partial derivatives of \(\varPhi \circ \varPsi \) with respect to the standard basis of \(\mathbb {R}^{n_1\times \cdots \times n_d}\) is that we get the following convenient description:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ22_HTML.png
(22)
For describing L, let for each \(1\le k \le d\),
$$\begin{aligned} \dot{U}^k = \begin{bmatrix} \dot{\mathbf{u }}_{2}^{k}&\dot{\mathbf{u }}_{3}^{k}&\cdots&\dot{\mathbf{u }}_{n_k}^{k} \end{bmatrix} \in \mathbb {R}^{n_k \times {(}n_k-1{)}} \quad \text {and}\quad \dot{V}^k = \begin{bmatrix} \dot{\mathbf{v }}_{2}^{k}&\dot{\mathbf{v }}_{3}^{k}&\cdots&\dot{\mathbf{v }}_{n_k}^{k} \end{bmatrix} \in \mathbb {R}^{n_k \times {(}n_k-1{)}} \end{aligned}$$
be matrices containing as columns an ordered orthonormal basis of \((\mathbf {u}^k)^\perp = \mathrm {T}_{\mathbf {u}^k} {\, \mathbb {S}(\mathbb {R}^{n_k})}\) and \((\mathbf {v}^k)^\perp = \mathrm {T}_{\mathbf {v}^k} {\, \mathbb {S}(\mathbb {R}^{n_k})}\), respectively. Then, by linearity and the product rule of differentiation, we have that \(L = \begin{bmatrix} \lambda L_1&\mu L_2 \end{bmatrix}\) is the block matrix consisting of 2 blocks of the form
$$\begin{aligned} L_1 = \begin{bmatrix} L_1^1&\cdots&L_1^d \end{bmatrix} \text { with } L_1^k&= \mathbf {u}_{}^{1} \otimes \cdots \otimes \mathbf {u}_{}^{k-1} \otimes \dot{U}^k \otimes \mathbf {u}_{}^{k+1} \otimes \cdots \otimes \mathbf {u}_{}^{d} \text { and }\\ L_2 = \begin{bmatrix} L_2^1&\cdots&L_2^d \end{bmatrix} \text { with } L_2^k&= \mathbf {v}_{}^{1} \otimes \cdots \otimes \mathbf {v}_{}^{k-1} \otimes \dot{V}^k \otimes \mathbf {v}_{}^{k+1} \otimes \cdots \otimes \mathbf {v}_{}^{d}.\nonumber \end{aligned}$$
(23)
Both \(L_1\) and \(L_2\) have \(\sum _{k=1}^d (n_k-1) = \varSigma -1\) columns. Note that M depends only on the \(\mathbf {u}^k\)’s and \(\mathbf {v}^k\)’s, whereas L also depends on the parameters \(\lambda \) and \(\mu \); we do not emphasize these dependencies in the notation.
Comparing with Breiding and Vannieuwenhoven [20, equation (5.1)], we see that the matrix \(L_1\) has as columns an orthonormal basis for the orthogonal complement of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq425_HTML.gif in https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq426_HTML.gif . Analogously, the columns of \(L_2\) form an orthonormal basis for the orthogonal complement of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq428_HTML.gif in https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq429_HTML.gif . Consequently, for https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq430_HTML.gif , Terracini’s matrix from Eq. (12) can be chosen as
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ24_HTML.png
(24)
This entails that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ25_HTML.png
(25)
having used the notation from Definition 3 and the fact that singular values are invariant under orthogonal transformations such as permutations of columns.

3.3 Bounding the Integral

We are now ready to conclude the proof of Theorem 1, by showing that the expected value of the condition number of tensor rank decomposition is bounded from below by infinity.
By Eq. (12), the condition number at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq431_HTML.gif is the inverse of the smallest singular value of the Terracini’s matrix U from Eq. (24). Therefore, if we plug Eqs. (24) and (25) into (18), then we get
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ26_HTML.png
(26)
where \( {q(U) = \frac{\mathrm {vol}(U)}{\varsigma _{\min }(U)}} \) is as in Eq. (8), and
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ27_HTML.png
(27)
From Eq. (24), it is clear that U is a function of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq433_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq434_HTML.gif but is independent of \(\lambda \) and \(\mu \). Therefore, if we integrate first over \(\lambda \) and \(\mu \), then we can ignore the factor q(U). In Appendix A.1, we compute this integral; the result is stated here as the next lemma.
Lemma 3
Let \((\mathbf {u}^1,\ldots ,\mathbf {u}^d), (\mathbf {v}^1,\ldots ,\mathbf {v}^d) \in \mathcal {P}\) be fixed. Then,
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ173_HTML.png
where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq440_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq441_HTML.gif .
The foregoing integral can be bounded from below by exploiting the next lemma, which is proved in Appendix A.2.
Lemma 4
Let \(\mathbf {x},\mathbf {y}\in \mathbb {S}(\mathbb {R}^p)\) be two unit-norm vectors and \(s\ge 1\). Then, there exists a constant \(k=k(p,s)\) independent of \(\mathbf {x},\mathbf {y}\) such that
$$\begin{aligned} \int _0^{\frac{\pi }{2}}\frac{(\cos (\theta ) \sin (\theta ) )^{s-1}}{\Vert \cos (\theta )\mathbf {x}+\sin (\theta )\mathbf {y}\Vert ^{2s}}\,\mathrm {d}{}\theta \ge \frac{k}{{\Vert \mathbf {x} + \mathbf {y}\Vert ^{2s-1}}}. \end{aligned}$$
Combining the foregoing lemmata and plugging the result into Eq. (26), we obtain
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ174_HTML.png
Next, we exploit the symmetry of the domain \(\mathbb {S}(\mathbb {R}^{n_1})\) by flipping the sign of \(\mathbf {v}^1\) and, hence, of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq448_HTML.gif . This substitution transforms U into UD, where D is a diagonal matrix with some pattern of \(\pm 1\) on the diagonal. Since D is orthogonal, \(q(U) = q(UD)\), so that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ175_HTML.png
Denote this last integral by J, and then, it remains to show that \(J=\infty \). Consider the open set
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ176_HTML.png
Since \(D(\epsilon )\) is open, we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ28_HTML.png
(28)
We now need two lemmata. The first one is straightforward.
Lemma 5
Let \(\epsilon >0\) be sufficiently small. For all https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq454_HTML.gif with https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq455_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq456_HTML.gif , we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ177_HTML.png
where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq457_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq458_HTML.gif .
Proof
For proving the upper bound, apply the triangle inequality to the telescoping sum
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ178_HTML.png
and exploit \(\Vert \mathbf {u}^k - \mathbf {v}^k\Vert \le \Vert \mathbf {u}^1 -\mathbf {v}^1\Vert \) for all \(k=1,\ldots ,d\). The lower bound follows from
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ179_HTML.png
having used \(0 < \langle \mathbf {u}^k, \mathbf {v}^k \rangle \le 1\) for sufficiently small \(\epsilon \). \(\square \)
The second one is the final piece of the puzzle. We prove it in Appendix A.3.
Lemma 6
For sufficiently small \(\epsilon >0\), we have for all https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq465_HTML.gif with https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq466_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq467_HTML.gif that
$$\begin{aligned} q(U) \ge {2^{-d/2}} \left( \frac{\Vert \mathbf {u}^1-\mathbf {v}^1\Vert }{2}\right) ^{\varSigma -1}, \end{aligned}$$
where U is the matrix that depends on https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq468_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq469_HTML.gif as in Eq. (24) and q is as in Eq. (8).
Combining Lemmas 5 and 6 with Eq. (28), we find
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ180_HTML.png
where \(c>0\) is some constant. Note that the integrand in this equation only depends on \(\mathbf {u}^1\) and \(\mathbf {v}^1\). By definition of \(D(\epsilon )\), for each \(2\le k \le d\), and if we fix \(\mathbf {u}^k\), the domain of integration of \(\mathbf {v}^k\) contains the difference of two spherical caps of respective affine radii \(\frac{9}{10} \Vert \mathbf {u}^1 - \mathbf {v}^1 \Vert \) and \(\Vert \mathbf {u}^1 - \mathbf {v}^1 \Vert \). From Eq. (14), the volume of this difference of caps is greater than a constant times \(\Vert \mathbf {u}^1 - \mathbf {v}^1 \Vert ^{n_j-1}\). Therefore, if we keep \(\mathbf {u}^1,\mathbf {v}^1 \in \mathbb {S}(\mathbb {R}^{n_1})\) constant and integrate over \(\mathbf {u}^k, \mathbf {v}^k \in \mathbb {S}(\mathbb {R}^{n_k})\), \(k=2,\ldots ,d\), then we get
$$\begin{aligned} J \ge c' \underset{\begin{array}{c} \mathbf {u}^1,\mathbf {v}^1\in \mathbb {S}(\mathbb {R}^{n_1}),\\ \Vert \mathbf {u}^1-\mathbf {v}^1\Vert \le \epsilon \end{array}}{\int } \frac{1}{\Vert \mathbf {u}^1-\mathbf {v}^1\Vert ^{\varSigma - ((n_2-1)+\cdots +(n_d-1))}}\,\mathrm {d}{}\mathbf {u}^1\,\mathrm {d}{}\mathbf {v}^1, \end{aligned}$$
where \(c'>0\) is a constant. Recall that \(\varSigma = 1 + \sum _{k=1}^d (n_k - 1)\), so that
$$\begin{aligned} J\ge c'\underset{\mathbf {u}^1\in \mathbb {S}(\mathbb {R}^{n_1})}{\int }\;\; \underset{\begin{array}{c} \mathbf {v}^1\in \mathbb {S}(\mathbb {R}^{n_1}),\\ \Vert \mathbf {u}^1-\mathbf {v}^1\Vert \le \epsilon \end{array}}{\int } \frac{1}{\Vert \mathbf {u}^1-\mathbf {v}^1\Vert ^{n_1}}\,\mathrm {d}{}\mathbf {v}^1 \,\mathrm {d}{}\mathbf {u}^1. \end{aligned}$$
By rotational invariance, the inner integral does not depend on \(\mathbf {u}^1\) and moreover for small \(\epsilon \) projecting through the stereographic projection (which has a Jacobian bounded above and below by a positive constant close to its center) we conclude that, for some other constant \(c''\),
$$\begin{aligned} J\ge c''\underset{\begin{array}{c} \mathbf {x}\in \mathbb {R}^{n_1-1},\\ \Vert \mathbf {x}\Vert \le \epsilon /2 \end{array}}{\int } \frac{1}{\Vert \mathbf {x}\Vert ^{n_1}}\,d\mathbf {x} =c''\mathrm {vol}({\mathbb {S}}(\mathbb {R}^{n_1-1}))\int _0^{\epsilon /2}\frac{r^{n_1-2}}{r^{n_1}}\,\mathrm {d}r =\infty . \end{aligned}$$
This proves \(J = \infty \), so that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq489_HTML.gif for tensors of rank bounded by 2, constituting a proof of Theorem 1.

4 The Average Condition Number: From Rank 2 to Higher Ranks

Having established that the average condition number of tensor rank decomposition of rank 2 tensors is infinite, we extend this result to higher ranks. That is, we will prove Theorem 2. As before, we abbreviate \(\mathcal {S}:= \mathcal {S}_{n_1,\ldots ,n_d}\), \(\sigma _r:=\sigma _{r,n_1,\ldots ,n_d}\), \(\mathcal {N}_r:=\mathcal {N}_{r,n_1,\ldots ,n_d}\), and \(\mathcal {M}_r:=\mathcal {M}_{r,n_1,\ldots ,n_d}.\)
We proceed with an observation that is of independent interest.
Lemma 7
Let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq494_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq495_HTML.gif be \(n_1 \times \cdots \times n_d\) tensors, where the https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq497_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq498_HTML.gif are rank-1 tensors. If https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq499_HTML.gif is \((r+s)\)-identifiable, then we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ181_HTML.png
Proof
First, we observe that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq501_HTML.gif is r-identifiable, and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq502_HTML.gif is s-identifiable. Indeed, if the tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq503_HTML.gif is \((r+s)\)-identifiable, then the unique set C of cardinality \(|C| \le r\) consisting of rank-1 tensors summing to https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq506_HTML.gif is https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq507_HTML.gif . If https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq508_HTML.gif had an alternative decomposition https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq509_HTML.gif , potentially of a shorter length \(r' \le r\), then https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq511_HTML.gif would be an alternative decomposition of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq512_HTML.gif . Hence, https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq513_HTML.gif needs to equal https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq514_HTML.gif , so that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq515_HTML.gif is r-identifiable. By symmetry, the result for https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq516_HTML.gif follows. For all i, let \(U_i\) be a matrix with orthonormal columns that span https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq518_HTML.gif , and \(V_i\) be a matrix with orthonormal columns that span https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq520_HTML.gif . Consider the matrices \(U=[U_1,\ldots ,U_r]\) and \(V=[V_1,\ldots ,V_s]\). By Eq. (12), we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ182_HTML.png
The claim follows from standard interlacing properties of singular values; see [51, Chapter 3]. \(\square \)
The next simple lemma is immediate.
Lemma 8
Consider the map https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq524_HTML.gif The following holds.
1.
For \(r > 2\), we have \(\phi (\sigma _2\times \mathcal {S}^{\times (r-2)}) = \sigma _r\).
 
2.
Let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq527_HTML.gif be r-identifiable. Then, https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq528_HTML.gif .
 
Finally, the next lemma is the key to Theorem 2, providing a lower bound for the Jacobian determinant of \(\phi \) in a special open subset of \(\sigma _2 \times \mathcal {S}^{\times (r-2)}\). We postpone its proof to Appendix B.
Lemma 9
On top of Assumption 1 we assume that \(\sigma _{r-2; n_1-2,\ldots ,n_d-2}\) is generically complex identifiable. Then, there are constants \(\mu ,\epsilon , \nu _1, \ldots , \nu _{r-2} >0\) depending only on \(r,n_1,\ldots ,n_d\) with the following property: For all  https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq534_HTML.gif there exists a tuple https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq535_HTML.gif with https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq536_HTML.gif and
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ183_HTML.png
where \(\phi \) is as in Lemma 8.
Remark 5
Given any https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq538_HTML.gif , by taking a sequence https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq539_HTML.gif converging to https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq540_HTML.gif one can generate the corresponding sequences https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq541_HTML.gif from Lemma 9. Now, by compactness we can find an accumulation point https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq542_HTML.gif . Since \(\mathrm {Jac}(\phi )\) is continuous and hence uniformly continuous when restricted to a compact set, by choosing small enough \(\epsilon \) we can assure that for all https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq545_HTML.gif , https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq546_HTML.gif and for all https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq547_HTML.gif , we have https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq548_HTML.gif , where \(\epsilon \) and \(\mu \) do not depend on https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq551_HTML.gif .
Now, we prove Theorem 2.
Proof of Theorem 2
Recall the surjective map \(\phi : \sigma _2\times {\mathcal {S}^{\times (r-2)}} \rightarrow \sigma _r\) from Lemma 8. From Theorem 1 and the fact that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq553_HTML.gif for \(t>0\), there exists a tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq555_HTML.gif such that for every \(\delta >0\) we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ184_HTML.png
From Lemma 9 and Remark 5, there exist tensors https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq557_HTML.gif such that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ185_HTML.png
for all https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq558_HTML.gif such that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq559_HTML.gif , and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq560_HTML.gif . Let \(\mathcal {U}\subseteq \mathcal {N}_2\times \mathcal {S}^{\times r-2}\) be the set of all https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq562_HTML.gif satisfying the foregoing conditions. From Lemma 7, we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ186_HTML.png
Moreover, by Lemma 9 and the inverse function theorem, by taking small enough \(\epsilon \) and \(\delta \) we can assume that \(\phi |_\mathcal {U}\) is a diffeomorphism onto its image4 and hence \(\phi (\mathcal {U})\) is open. The coarea formula Eq. (15) thus applies yielding
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ187_HTML.png
The theorem follows since \(\phi (\mathcal {U})\subseteq \sigma _r\). \(\square \)

5 The Angular Condition Number of Tensor Rank Decomposition

In this section, we prove Theorem 3. As in the previous section, to ease notation, we abbreviate \(\mathcal {M}_2:= \mathcal {M}_{2;n_1,\ldots ,n_d}\), \(\mathcal {N}_2:= \mathcal {N}_{2;n_1,\ldots ,n_d}\), \(\mathcal {S}_2:= \mathcal {S}_{2;n_1,\ldots ,n_d}\), and \(\sigma _2:= \sigma _{2;n_1,\ldots ,n_d}\).

5.1 A Characterization of the Angular Condition Number as a Singular Value

We first derive a formula for the angular condition number in terms of singular values, similar to the one from Eq. (12). Recall from Eq. (6) that the angular condition number for rank \(r=2\) is
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ188_HTML.png
where \(p: \mathbb {R}^{n_1\times \cdots \times n_d} \rightarrow \mathbb {S}(\mathbb {R}^{n_1\times \cdots \times n_d})\) is the canonical projection onto the sphere and where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq576_HTML.gif is a local inverse of \(\varPhi :\mathcal {S} \times \mathcal {S} \rightarrow \sigma _2\) at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq578_HTML.gif with https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq579_HTML.gif . As before, the value of \(\kappa _{\mathrm {ang}}\) on \(\sigma _2{\setminus }\mathcal {N}_2\) is not relevant for our analysis, so we do not specify it.
Proposition 2
Under Assumption 1, let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq582_HTML.gif , where for \(1\le k\le d\) we have \(\mathbf {u}^k,\mathbf {v}^k\in \mathbb {S}(\mathbb {R}^{n_k})\). Recall from Eq. (20) the definitions of the matrices M and L, associated to https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq585_HTML.gif . The following equality holds:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ189_HTML.png
as far as the right–hand term is finite.
Proof
By Proposition 1, any local inverse https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq586_HTML.gif is differentiable at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq587_HTML.gif . The projection p is also differentiable, so that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ190_HTML.png
where \(\Vert \cdot \Vert _2\) is the spectral norm from Eq. (7). We compute this norm.
Let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq589_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq590_HTML.gif . Then, by linearity of the derivative, we have https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq591_HTML.gif . Furthermore, for \(i=1,2\), the derivative https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq593_HTML.gif is the orthogonal projection onto the orthogonal complement of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq594_HTML.gif in \(\mathbb {R}^\varPi \). According to this, we decompose https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq596_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq597_HTML.gif as
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ191_HTML.png
Then, we have https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq598_HTML.gif and, consequently,
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ29_HTML.png
(29)
Recall from Eq. (20) the matrices \(L=\begin{bmatrix} \lambda L_1&\mu L_2\end{bmatrix}\) and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq600_HTML.gif . We can find vectors \(\mathbf {x}_1,\mathbf {x}_2 \in \mathbb {R}^{\varSigma -1}\) with https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq602_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq603_HTML.gif , and such that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq604_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq605_HTML.gif . Observe that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq606_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq607_HTML.gif . This yields
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ30_HTML.png
(30)
Writing
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ192_HTML.png
Since we are assuming that \((\mathrm {I}- MM^\dagger )L\) is injective (for \(\varsigma _{\min }((\mathrm {I}-MM^\dagger )L)\ne 0\)), it has a left inverse and we can write
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ31_HTML.png
(31)
Combining Eqs. (30) and (31), we see that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ193_HTML.png
the second equality from \( \left( PL\right) ^\dagger P= \left( PL\right) ^\dagger \), which is a basic property of the Moore–Penrose pseudo-inverse holding for any orthogonal projector P. This finishes the proof. \(\square \)

5.2 Proof of Theorem 3

Now comes the actual proof of Theorem 3. Proceeding in exactly the same way as in Sect. 3.1 and using Proposition 2, we get
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ32_HTML.png
(32)
where \(C_2 = C_{2;n_1,\ldots ,n_d}\) is as in Definition 1, \(\mathcal {P}\) is as in Eq. (16), \(Q = \begin{bmatrix} L&M \end{bmatrix}\) is as in Eq. (19), the volume \(\mathrm {vol}\) is as in Definition 3, and
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ194_HTML.png
is as in Eq. (27), so that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq616_HTML.gif . Next, we relate \(\mathrm {vol}(Q)\) to the volume of \((\mathrm {I}- M M^\dagger )L\).
Lemma 10
We have \( \mathrm {vol}(Q) = \mathrm {vol}(M)\,\mathrm {vol}((\mathrm {I}-MM^\dagger )L). \)
Proof
Let \(Q^\perp \) be a matrix whose columns contain an orthonormal basis for the orthogonal complement of the column span of Q. Then, from the definition,
$$\begin{aligned} \mathrm {vol}(Q) = \mathrm {vol}\bigl ( \begin{bmatrix} Q&Q^\perp \end{bmatrix} \bigr ) =\bigl |\det \bigl ( \begin{bmatrix} M&L&Q^\perp \end{bmatrix} \bigr )\bigr | =\left| \det \left( \begin{bmatrix} M&L&Q^\perp \end{bmatrix} \begin{bmatrix} \mathrm {I}&{}-M^\dagger L&{}0\\ 0&{}\mathrm {I}&{}0\\ 0&{}0&{}\mathrm {I}\end{bmatrix}\right) \right| , \end{aligned}$$
where in the last step we just multiplied by a matrix whose determinant is 1. Performing the inner multiplication, we then get
$$\begin{aligned} \mathrm {vol}(Q)= \bigl |\det \bigl (\begin{bmatrix} M&(\mathrm {I}-MM^\dagger )L&Q^\perp \end{bmatrix} \bigr )\bigr | = \mathrm {vol}\bigl ( \begin{bmatrix} M&(\mathrm {I}-MM^\dagger )L \end{bmatrix} \bigr ). \end{aligned}$$
These two blocks are mutually orthogonal, since \((\mathrm {I}-MM^\dagger )\) is the projection on the orthogonal complement of the span of M, and hence, the volume is the product of the volumes corresponding to each block. The assertion follows. \(\square \)
We use Lemma 10 to rewrite Eq. (32) as
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ33_HTML.png
(33)
where q is as in Eq. (8). Recall from Eq. (22) that M is independent of \(\lambda \) and \(\mu \). We first compute the integral over \(\lambda , \mu \) using the next lemma. We prove the lemma in Appendix C.1.
Lemma 11
Let \(L_1,L_2\) be the matrices defined as in Eq. (23), such that \(L=\begin{bmatrix} \lambda L_1&\mu L_2\end{bmatrix}\). Let
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ195_HTML.png
Then,
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ196_HTML.png
Inserting the results from this lemma into Eq. (33), we get
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ197_HTML.png
where
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ198_HTML.png
In the remaining part of this section, we show that \(J_\mathrm {outer}\) is bounded by a constant, which would conclude the proof. We do this by giving a sequence of upper bounds. We have no hope of providing sharp bounds, so rather than keeping track of all the constants, we will exploit the following definition for streamlining the proof.
Definition 4
For \(A,B\in [0,\infty ]\), we will write \(A \preceq B\) if \(B\in \mathbb {R}\) implies \(A\in \mathbb {R}\). That is, \(A\preceq B\) is an equivalent statement to “\(B<\infty \Rightarrow A<\infty \).”
First, note that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq635_HTML.gif , so that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ199_HTML.png
Next, we exploit the symmetry of \(\mathbb {S}(\mathbb {R}^{n_1})\) and transform \(\mathbf {v}^1 \mapsto -\mathbf {v}^1\). This transformation flips the sign of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq638_HTML.gif , but the value of q is not affected. Indeed, the matrix \(\mathrm {I}- M M^\dagger \) still projects onto https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq640_HTML.gif , and \(L_2\) is transformed into \(L_2 D\), where D is a diagonal matrix with some pattern of \(\pm 1\) on the diagonal. Since \(\left[ {\begin{matrix}I &{} \\ &{} D\end{matrix}}\right] \) is an orthogonal transformation, the singular values do not change. Thus, we obtain
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ200_HTML.png
The next lemma is proved in Appendix C.2.
Lemma 12
Let \(\theta \in [0,\tfrac{\pi }{2}]\) and fix https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq646_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq647_HTML.gif . There is a constant \(K>0\), depending only on \(n_1,\ldots ,n_d\) and d, such that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ34_HTML.png
(34)
The lemma implies
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ35_HTML.png
(35)
For bounding the integral over \(\theta \), we need the next lemma, which we prove in Appendix C.3.
Lemma 13
Let \(a>1,p\ge 1\). There exists a constant \(K>0\), depending only on a, such that for any unit vectors \(\mathbf {x},\mathbf {y}\in \mathbb {S}{}(\mathbb {R}^p)\), \(\mathbf {x}\ne \mathbf {y}\), we have
$$\begin{aligned} \int _{0}^{\frac{\pi }{2}}\frac{1}{{\Vert \cos (\theta ) \mathbf {x}-\sin (\theta ) \mathbf {y}\Vert ^{a}}} \mathrm {d}{}\theta \le \frac{K}{\Vert \mathbf {x}-\mathbf {y}\Vert ^{a-1}}. \end{aligned}$$
Applying this lemma to Eq. (35), we obtain
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ201_HTML.png
Writing https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq655_HTML.gif , we arrive at
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ202_HTML.png
By orthogonal invariance, we may fix \(\mathbf {u}^k\in \mathbb {S}(\mathbb {R}^{n_k})\) to be \(\mathbf {u}^k = (1,0,\ldots ,0)\), and integrate the constant function 1 over one copy of \(\mathbb {S}{}(\mathbb {R}^{n_1}) \times \cdots \times \mathbb {S}{}(\mathbb {R}^{n_d})\). Ignoring the product of volumes \(\prod _{k=1}^d\mathrm {vol}(\mathbb {S}{}(\mathbb {R}^{n_k}))\) we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ203_HTML.png
Now, this spherical integral is particularly simple because the integrand depends uniquely on one of the components of each vector. One can thus transform each integral in a sphere into an integral in an interval (see, for example, [10, Lemma 1]) getting:
$$\begin{aligned} J_\mathrm {outer}\preceq \int _{t_1,\ldots ,t_d\in [-1,1]}\frac{(1-t_1^2)^{\frac{n_1-1}{2}-1}\cdots (1-t_d^2)^{\frac{n_d-1}{2}-1}}{\sqrt{1-t_1\cdots t_d}^{\varSigma -2}}\,\mathrm {d}{}t_1\cdots \mathrm {d}{}t_d. \end{aligned}$$
For this last integral, we consider the partition of the cube \([-1,1]^d\) into \(2^d\) pieces corresponding to the different signs of the coordinates. In the pieces where the number of negative coordinates is odd, the denominator of the integrand is bounded below by 1 and thus the whole integrand is also bounded above by 1. Hence, it suffices to check that the integral in the rest of the pieces is bounded. Assume now that \(t_{i_1},\ldots ,t_{i_k}\) with \(k\ge 2\) even are the negative coordinates in some particular piece of the partition. The mapping that leaves all coordinates fixed but maps \(t_{i_{k-1}} \mapsto -t_{i_{k-1}}\) and \(t_{i_{k}} \mapsto -t_{i_{k}}\) preserves the integrand and moves the domain to another piece of the partition with \(k-2\) negative coordinates. This process can then be repeated until none of the coordinates is negative. All in one, we have
$$\begin{aligned} J_\mathrm {outer}\preceq \int _{t_1,\ldots ,t_d\in [0,1]}\frac{(1-t_1^2)^{\frac{n_1-1}{2}-1}\cdots (1-t_d^2)^{\frac{n_d-1}{2}-1}}{\sqrt{1-t_1\cdots t_d}^{\varSigma -2}}\,\mathrm {d}{}t_1\cdots \mathrm {d}{}t_d. \end{aligned}$$
The change of variables \(t_k=\cos (\theta _k)\) for \(1\le k\le d\) converts this last integral into
$$\begin{aligned} \int _{\theta _1,\ldots ,\theta _d\in [0,\frac{\pi }{2}]}\frac{\sin (\theta _1)^{n_1-2}\cdots \sin (\theta _d)^{n_d-2}}{\sqrt{1-\cos (\theta _1)\cdots \cos (\theta _d)}^{\varSigma -2}}\,\mathrm {d}{}\theta _1 \cdots \mathrm {d}{}\theta _d. \end{aligned}$$
(36)
The next lemma is proved in Appendix C.4.
Lemma 14
Let \(d\ge 1\) and \(\theta _1,\ldots ,\theta _d\in [0,\tfrac{\pi }{2}]\). Then, \( \cos (\theta _1)\cdots \cos (\theta _d)\le 1-\frac{\theta _1^2+\cdots +\theta _d^2}{7d}. \)
Using the lemma and the inequality \(\sin (\theta )<\theta \) on \(0\le \theta \le \tfrac{\pi }{2}\), we find that the integral in Eq. (36) is bounded by a constant times the following integral:
$$\begin{aligned} \int _{\theta _1,\ldots ,\theta _d\in [0,\frac{\pi }{2}]}\frac{\theta _1^{n_1-2}\cdots \theta _d^{n_d-2}}{\sqrt{\theta _1^2+\cdots + \theta _d^2}^{\,\varSigma -2}}\,\mathrm {d}{}\theta _1 \cdots \mathrm {d}{}\theta _d. \end{aligned}$$
Changing the name of the variables to \(x_1,\ldots ,x_d\) and integrating over the d-dimensional ball of radius \(\tfrac{\pi }{2}\sqrt{d}\), which contains the domain \([0,\tfrac{\pi }{2}]^d\), we get a new upper bound for the last integral, which implies
$$\begin{aligned} J_\mathrm {outer}\preceq \int _{x_1^2+\cdots +x_d^2\le \frac{\pi ^2d}{4}}\frac{x_1^{n_1-2}\cdots x_d^{n_d-2}}{\sqrt{x_1^2+\cdots + x_d^2}^{\, \varSigma -2}}\,\mathrm {d}{}x_1 \cdots \mathrm {d}{}x_d. \end{aligned}$$
Recall that \(\varSigma = 1 + \sum _{j=1}^d (n_j-1)\). By passing to polar coordinates, we get
$$\begin{aligned} J_\mathrm {outer}&\preceq \int _{x\in \mathbb {S}(\mathbb {R}^d)}x_1^{n_1-2}\cdots x_d^{n_d-2}\int _0^{\frac{\pi \sqrt{d}}{2}}\frac{\rho ^{d-1+\sum _{j=1}^d(n_j-2)}}{\rho ^{-1 + \sum _{j=1}^d (n_j-1)}}\,\mathrm {d}{}\rho \,\mathrm {d}{}x_1\cdots \mathrm {d}{}x_d\\&\le \mathrm {vol}(\mathbb {S}(\mathbb {R}^d)) \int _{0}^{\frac{\pi \sqrt{d}}{2}} 1 = \mathrm {vol}(\mathbb {S}(\mathbb {R}^d)) \frac{\pi \sqrt{d}}{2} < \infty . \end{aligned}$$
This shows \(J_\mathrm {outer} < \infty \) implying https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq679_HTML.gif , finishing the proof of Theorem 3. \(\square \)

6 Other Random Tensors: Proof of Theorem 4

We demonstrate how our main results can be extended to many other distributions as well.
Consider the first item of Theorem 4. We assume that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq681_HTML.gif has the density \(\hat{\rho }\) and that there exists positive constants \(c_1,c_2\) such that \(c_1 \le \frac{{\hat{\rho }}}{\rho } \le c_2\), where \(\rho \) is the density of a GIT. Then, for any measurable function https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq686_HTML.gif we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ204_HTML.png
and
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ205_HTML.png
Thus, https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq687_HTML.gif if and only if https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq688_HTML.gif . Replacing f by \(\kappa \) and \(\kappa _{\mathrm {ang}}\) proves the first part of Theorem 4.
By Breiding and Vannieuwenhoven [20, Proposition 4.4], \(\kappa \) is invariant under multiplication of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq692_HTML.gif by a scalar. Therefore, the expected value of \(\kappa \) for the Gaussian is equal to the expected value when https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq694_HTML.gif is chosen uniformly in the unit ball, and also when https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq695_HTML.gif is chosen uniformly in the unit sphere of the space of tensors. Namely, we have (see, e.g., [28, Section 2.2.4])
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ206_HTML.png
This proves the second and third item of Theorem 4 for \(\kappa \).
For \(\kappa _{\mathrm {ang}}\), we need the following lemma.
Lemma 15
If https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq698_HTML.gif is an r-nice tensor, then https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq699_HTML.gif for all \(t>0\).
Proof
Since https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq701_HTML.gif is r-nice, we have https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq702_HTML.gif . Similar as for Eq. (29), we can show where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq704_HTML.gif is the CPD of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq705_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq706_HTML.gif is the corresponding decomposition the tangent vector. The derivative https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq707_HTML.gif is the orthogonal projection onto https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq708_HTML.gif and independent of scaling. Moreover, \(\sigma _{r;n_1,\ldots ,n_d}\) is a cone and so https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq710_HTML.gif can be identified with https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq711_HTML.gif . This shows that after scaling the tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq712_HTML.gif we get and hence https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq714_HTML.gif .
\(\square \)
Now, we can prove the rest of Theorem 4. Recall from Definition 1 that the density of a GIT on \(\sigma _{r;n_1,\ldots ,n_d}\) is , where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq718_HTML.gif . Since our results for \(\kappa _{\mathrm {ang}}\) are for rank-2 tensors, we put \(r=2\) in the following. We also abbreviate \(\sigma _2:=\sigma _{2;n_1,\ldots ,n_d}\) and \(C_2:=C_{2;n_1,\ldots ,n_d}\). Then, using Lemma 15 we can integrate in polar coordinates to obtain
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ207_HTML.png
It follows immediately that the last integral is finite, proving that a randomly chosen https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq723_HTML.gif has finite expected \(\kappa _{\mathrm {ang}}\). Finally, if https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq725_HTML.gif is chosen randomly in the unit ball in \(\sigma _2\), the same argument shows that the expected value is again finite:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ208_HTML.png
This finishes the proof of Theorem 4.

7 Numerical Experiments

Having proved that the expected value of the condition number is infinite in most cases, we provide further computational evidence in support of Conjecture 1. To this end, a natural idea is to perform Monte Carlo experiments in a few of the unknown cases as in [23].
Sampling GITs is hard in practice, as the defining polynomial equalities and inequalities of the semialgebraic set \(\sigma _r = \sigma _{r;n_1,\ldots ,n_d}\) of tensors of rank bounded by r are not known in the literature.5 Nevertheless, there are a few cases that we can treat numerically. If \(r = \frac{\varPi }{\varSigma }\) and the algebraic closure \(\overline{\sigma _r}(\mathbb {C})\) has \(\dim \overline{\sigma _r}(\mathbb {C}) = \varPi \), a so-called perfect tensor space, then \(\sigma _r\) is an open subset of the ambient \(\mathbb {R}^{n_1 \times \cdots \times n_d}\); see, e.g., [15, 57].
From Remark 2, we can sample from the density \(\rho \) on \(\sigma _r\) via an acceptance–rejection method: Randomly sample tensors https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq736_HTML.gif from the density https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq737_HTML.gif until we find one that belongs to \(\sigma _r\). While this scheme will yield tensors distributed according to the density \(\rho \) on \(\sigma _r\), it does not yield Gaussian identifiable tensors in general. The reason is that most perfect tensor spaces are not (expected to be) generically r-identifiable [47]. Fortunately, there are a few known exceptions: matrix pencils (\(\mathbb {R}^{n \times n \times 2}\) for all \(n\ge 2\)), \(\mathbb {R}^{5 \times 4 \times 3}\) and \(\mathbb {R}^{3 \times 2 \times 2 \times 2}\) are proved to be generically complex r-identifiable for \(r = \frac{\varPi }{\varSigma }\). By applying the acceptance–rejection method to these spaces, every sampled tensor is a GIT with probability 1.
For numerically checking, if a random tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq746_HTML.gif in a perfect tensor space lies in \(\sigma _r\) with \(r = \frac{\varPi }{\varSigma }\), we apply a homotopy continuation method to the square system of \(\varPi \) equations
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ209_HTML.png
where the \(\varPi = r \varSigma \) entries of the \(\mathbf {a}_{i}^{k}\)’s are treated as variables, and the \(n_1 \times \cdots \times n_d\) tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq753_HTML.gif is the tensor to decompose. We generate a start system with one solution to track by randomly sampling the entries of the vectors \(\mathbf {a}_{i}^{k}\) i.i.d. from a real standard Gaussian distribution and then constructing the corresponding tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq755_HTML.gif . Since \(r = \frac{\varPi }{\varSigma }\) is the so-called generic rank of tensors in perfect tensor spaces \(\mathbb {C}^{n_1 \times \cdots \times n_d}\), the above system has at least one complex solution with probability 1 as well. If we consider complex r-identifiable perfect tensor spaces at the generic rank, we can thus determine if https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq758_HTML.gif by solving the square system and checking whether the unique solution is real. Assuming that we use a certified homotopy method such as alphaCertified [48], this approach will correctly classify https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq759_HTML.gif with probability 1, thus not impacting the overall distribution produced by the acceptance–rejection scheme.
We implemented the above scheme in Julia 1.0.3 using version 0.4.3 of the package HomotopyContinuation.jl [19], employing the solve function with default parameter settings. We deem a solution real if the norm of the imaginary part is less than \(10^{-8}\). Note that this package does not offer certified tracking; however, the failure rate observed in our experiments was very low, namely 0.0512498%—see Table 1. For this reason, we are convinced that the distribution produced by the acceptance–rejection scheme is very close to the true distribution.
We performed the following experiment for estimating the distribution of the condition numbers of GITs of generically complex r-identifiable tensors in perfect tensor spaces with \(r = \frac{\varPi }{\varSigma }\), the complex generic rank. As explained above, we randomly sampled an element https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq762_HTML.gif of \(\mathbb {R}^{n_1 \times \cdots \times n_d}\) from the density https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq764_HTML.gif by choosing its entries i.i.d. standard normally distributed. Then, we generated one random starting system and applied the solve function from HomotopyContinuation.jl for tracking the starting solution https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq765_HTML.gif to the target https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq766_HTML.gif . If the final solution of the square system was real, we recorded both the regular and angular condition numbers at the CPD of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq767_HTML.gif computed via homotopy continuation. These computations were performed in parallel using 20 computational threads until 100, 000 finite, nonsingular, real solutions and corresponding condition numbers were obtained. This experiment was performed on a computer system consisting of 2 Intel Xeon E5-2697 v3 CPUs with 12 cores clocked at 2.6 GHz and 128GB main memory. Information about the sampling process via the acceptance–rejection method is summarized in Table 1, and Fig. 1 visualizes the complementary cumulative distribution functions of the regular and angular condition numbers.
Table 1
Results of sampling GITs in \(\sigma _{r;n_1,n_2,n_3} \subset \mathbb {R}^{n_1 \times n_2 \times n_3}\) via an acceptance–rejection method
\(n_1 \times n_2 \times n_3\)
r
Samples
Fraction in \(\mathbb {R}\)
Time (min)
\(\mathbb {R}\)
\(\mathbb {C}\)
failed
\(2 \times 2 \times 2\)
2
100, 000
27, 335
41
\(\underline{0.785}3\ldots \approx \frac{\pi }{4}\)
1.3
\(3 \times 3 \times 2\)
3
100, 000
101, 345
185
\(\underline{0.49}66\ldots \approx \frac{1}{2}\)
2.8
\(4 \times 4 \times 2\)
4
100, 000
288, 770
325
\(\underline{0.25}72\ldots \approx \frac{27 \pi ^2}{1024}\)
14.9
\(5 \times 4 \times 3\)
6
100, 000
1, 237, 912
643
\(0.0747\ldots \)
420.6
\(5 \times 5 \times 2\)
5
100, 000
810, 254
509
\(\underline{0.10}98\ldots \approx \frac{1}{9}\)
99.3
Columns three to five list the number of samples where the final tracked solution of the homotopy was real, complex or failed, respectively. The next column shows the fraction of successful samples that were real; in the case of \(n \times n \times 2\) the analytical solution from Bergqvist and Forrester [13] is also stated and the correct digits from the empirical estimate are underlined. The final column indicates the total wall-clock time required to perform the Monte Carlo experiments
In Table 1, the total fractions of solutions that are real when sampling random Gaussian tensors (with i.i.d. standard normally distributed entries) seem to agree very well with the known theoretical results by Bergqvist and Forrester [13]; they showed that for random Gaussian \(n \times n \times 2\) tensors the rank is \(n = \frac{\varPi }{\varSigma }\) with probability \(p_n := \varGamma \bigl (\frac{n+1}{2}\bigr )^n \bigl ( G(n+1) \bigr )^{-1}\), where \(\varGamma \) is the gamma function and G the Barnes G-function (or double gamma function). The correct digits in the numerical approximation are underlined in the penultimate column of Table 1.
The empirical complementary cumulative distribution functions of the regular and angular condition numbers are shown in Fig. 1. The full lines correspond to the empirical data and the thinner dashed lines correspond to an exponential model fitted to the data. From the figure, it is namely reasonable to postulate that the complementary cumulative distribution function c(x) for large x has the form
$$\begin{aligned} c(x) = 1 - \int _0^x p(t) \,\mathrm {d}{}t = a x^{-b}, \end{aligned}$$
(37)
which corresponds to a straight line in the log–log plot in Fig. 1. We fitted the parameters a and b of the postulated model to the data restricted to the range \(10^{-1} \le c(x) \le 10^{-3}\). The reason for restricting the data set is that for small condition numbers it is visually evident in Fig. 1 that the model is incorrect and for large condition numbers the data contain few samples, which negatively impacts the robustness of the fit. The parameters were fitted using fminsearch from MATLAB R2017b with starting point (1, 1) and default settings. In all cases, the algorithm terminated because the relative change of the parameters fell below \(10^{-4}\). The obtained parameters are shown in Table 2, along with the coefficient of determination \(R^2\) between the log-transformed data and log-transformed model predictions; 1 indicates perfect correlation.
Table 2
Estimated parameters of the exponential model Eq. (37) fitted to the complementary cumulative distribution functions from Fig. 1
\(n_1 \times n_2 \times n_3\)
Regular
Angular
a
b
\(R^2\)
a
b
\(R^2\)
\(2 \times 2 \times 2\)
0.6624
0.6904
0.9999
1.7288
1.8624
0.9995
\(3 \times 3 \times 2\)
2.2348
0.6636
0.9999
5.5496
1.8856
0.9998
\(4 \times 4 \times 2\)
4.7318
0.6388
0.9997
11.4165
1.8455
0.9994
\(5 \times 4 \times 3\)
22.3141
0.6461
0.9998
102.4887
1.6337
0.9992
\(5 \times 5 \times 2\)
9.8634
0.6436
0.9997
23.6951
1.8662
0.9996
The coefficient of determination \(R^2\) between the log-transformed data and log-transformed model predictions is also indicated
We may estimate the expected values of the regular and angular condition numbers based on their empirical distributions. If p(x) denotes the probability density function of the regular condition number, then
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ210_HTML.png
From the postulated model of c(x), we find that \(p(x) = ab x^{-b-1}\) for large x, so that we postulate that the expected value will be well approximated by
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ211_HTML.png
for some finite \(\kappa _0\). The expression on the right is finite only if \(b > 1\). The same discussion applies to the angular condition number as well. Regarding the estimated parameters in Table 2, our empirical data strongly suggest that the expected value of the condition number is infinite for \(r = \frac{\varPi }{\varSigma }\) in the tested cases, as \(b \approx 0.6 < 1\). On the other hand, the expected angular condition number seems finite in all cases, as \(b \approx 1.8 > 1\). This suggests that both Theorems 2 and 3 might hold for higher ranks as well.

Acknowledgements

We thank the reviewers for helpful suggestions. Part of this work was made while the second and third author were visiting the Universidad de Cantabria, supported by the funds of Grant 21.SI01.64658 (Banco Santander and Universidad de Cantabria), Grant MTM2017-83816-P from the Spanish Ministry of Science. The third author was additionally supported by the FWO Grant for a long stay abroad V401518N. We thank these institutions for their support.

Open Access

This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://​creativecommons.​org/​licenses/​by/​4.​0/​), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix

Proof of the lemmata from Section 3

Proof of Lemma 3

Integrating in polar coordinates, we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ212_HTML.png
The change of variables https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq806_HTML.gif transforms the integral for \(\rho \) into
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ213_HTML.png
The last integral is \(2^{\varSigma -1}\varGamma (\varSigma )\). Plugging this into the equation above shows the assertion. \(\square \)

Proof of Lemma 4

Expanding the denominator and taking a change of variables by \(\varphi =2\theta \), the lemma reduces to proving that
$$\begin{aligned} \int _0^{\pi /2}\frac{\cos ^{s-1}\varphi }{(1\pm a\cos \varphi )^s} \mathrm {d}\varphi =\int _0^{\pi /2}\frac{\sin ^{s-1}\varphi }{(1\pm a\sin \varphi )^s} \mathrm {d}\varphi \ge \frac{k(s)}{(1\pm a)^{s-1/2}}, \end{aligned}$$
(for the first equality, make the change of variables \(\varphi \rightarrow \pi /2-\varphi \)) where we are denoting \(a=|\langle \mathbf {x},\mathbf {y}\rangle |\in [0,1]\). With the \(+\) sign the inequality is clear (choosing an appropriate k(s)). For the other case, we have, up to a constant k(s), the lower bound
$$\begin{aligned} \int _0^{\pi /4}\frac{\mathrm {d}\varphi }{(1- a\cos \varphi )^s} \ge \int _0^{\pi /4}\frac{\mathrm {d}\varphi }{\left( 1- a\left( 1-\varphi ^2/2\right) \right) ^s} =\frac{1}{(1-a)^s}\int _0^{\pi /4} \frac{\mathrm {d}\varphi }{\left( 1+\frac{a}{1-a}\frac{\varphi ^2}{2}\right) ^s} . \end{aligned}$$
Thus, it suffices to check that
$$\begin{aligned} \int _0^{\pi /4} \frac{1}{\left( 1+\frac{a}{1-a}\frac{\varphi ^2}{2}\right) ^s} d \varphi \ge k(s)\sqrt{1-a}, \end{aligned}$$
but this is easily checked taking the change of variables \(\varphi ^2=(1-a)t\). \(\square \)

Proof of Lemma 6

Recall from the definition of \(D(\epsilon )\) that \(\Vert \mathbf {u}^1-\mathbf {v}^1\Vert < \epsilon \), and that \(\frac{9}{10} \Vert \mathbf {u}^1 - \mathbf {v}^1\Vert< \Vert \mathbf {u}^k - \mathbf {v}^k\Vert <\Vert \mathbf {u}^1 - \mathbf {v}^1\Vert \) for \(2\le k\le d\). If \(\epsilon > 0\) is sufficiently small, we can assume
$$\begin{aligned} \delta _k := \langle \mathbf {u}^k, \mathbf {v}^k \rangle \ge \frac{9}{10}, \quad 1 \le k \le d. \end{aligned}$$
(38)
To prove the lemma, it suffices to show that
$$\begin{aligned} q(U)^2 \ge {2^{-d}}\left( \frac{\Vert \mathbf {u}^1 - \mathbf {v}^1 \Vert ^2}{4} \right) ^{\varSigma - 1} = {2^{-d}} \left( \frac{1 - \delta _1}{2} \right) ^{\varSigma - 1} \end{aligned}$$
for sufficiently small \(\epsilon \).
For convenience, we will first introduce a few auxiliary variables. Consider the next picture:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ214_HTML.png
Then, for small \(\theta > 0\), we have the following elementary trigonometric relations:
$$\begin{aligned} \delta _k&:= \cos \theta = \langle \mathbf {u}^k,\mathbf {v}^k\rangle = \sqrt{\tfrac{1}{\epsilon _k^2+1}}, \\ \zeta _k&:= \sin \theta = \sqrt{1 - \delta _k^2} = \delta _k \epsilon _k, \text { and }\nonumber \\ \epsilon _k&:= \tan \theta = \sqrt{\tfrac{1}{\delta _k^2}-1}.\nonumber \end{aligned}$$
(39)
It follows from the definition of \(D(\epsilon )\) that
$$\begin{aligned} \delta _1 = \min \, \{\delta _1,\ldots ,\delta _d\} \quad \text { and } \quad \epsilon _1 = \max \, \{\epsilon _1,\ldots ,\epsilon _d\}.\nonumber \end{aligned}$$
From the previous figure, it is also clear that
$$\begin{aligned} \zeta _k = \delta _k \epsilon _k \le \Vert \mathbf {u}^k - \mathbf {v}^k \Vert \le \epsilon _k, \quad \text {and so}\quad \left( \frac{9}{10}\right) ^2 \epsilon _1 \le \frac{9}{10} \Vert \mathbf {u}^1 - \mathbf {v}^1\Vert \le \Vert \mathbf {u}^k - \mathbf {v}^k\Vert \le \epsilon _k, \end{aligned}$$
(40)
having used in the right sequence of inequalities that Eq. (38) implies \(\frac{9}{10} \delta _1 \epsilon _1 \ge \left( \frac{9}{10} \right) ^2 \epsilon _1\). Then,
$$\begin{aligned} 1-\frac{\epsilon _1^2}{2}\le 1-\frac{\epsilon _k^2}{2}\le \delta _k=\sqrt{\tfrac{1}{\epsilon _k^2+1}}\le 1-\frac{\epsilon _k^2}{4}\le 1-\frac{1}{4}\left( \frac{9}{10}\right) ^4\epsilon _1^2. \end{aligned}$$
(41)
Finally, we will also use
$$\begin{aligned} z := \delta _1\cdots \delta _d. \end{aligned}$$
(42)

Transforming q(U)

For computing q(U), we will first make a convenient orthogonal transformation of U’s columns. Recall from Eq. (24) that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq825_HTML.gif . As in Eq. (22), we write https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq826_HTML.gif . Then, \(q(U) = q(\begin{bmatrix} M&L_1&L_2 \end{bmatrix})\). The block structure of \(L_1,L_2\in \mathbb {R}^{\varPi \times (\varSigma -1)}\) was given in Eq. (23): \(L_1\) is made up of the d blocks
$$\begin{aligned} L_1^k = \mathbf {u}_{}^{1} \otimes \cdots \otimes \mathbf {u}_{}^{k-1} \otimes \dot{U}^k \otimes \mathbf {u}_{}^{k+1} \otimes \cdots \otimes \mathbf {u}_{}^{d}, \quad 1\le k\le d, \end{aligned}$$
and \(L_2\) is analogously made up of the d blocks
$$\begin{aligned} L_2^k = \mathbf {v}_{}^{1} \otimes \cdots \otimes \mathbf {v}_{}^{k-1} \otimes \dot{V}^k \otimes \mathbf {v}_{}^{k+1} \otimes \cdots \otimes \mathbf {v}_{}^{d}, \quad 1\le k\le d, \end{aligned}$$
where \( \dot{U}^k = \begin{bmatrix} \dot{\mathbf{u }}_{2}^{k}&\dot{\mathbf{u }}_{3}^{k}&\cdots&\dot{\mathbf{u }}_{n_k}^{k} \end{bmatrix} \in \mathbb {R}^{n_k \times n_k-1}\) and \(\dot{V}^k = \begin{bmatrix} \dot{\mathbf{v }}_{2}^{k}&\dot{\mathbf{v }}_{3}^{k}&\cdots&\dot{\mathbf{v }}_{n_k}^{k} \end{bmatrix} \in \mathbb {R}^{n_k \times n_k-1} \) are matrices whose columns form an orthonormal basis of \((\mathbf {u}^k)^\perp := \mathrm {T}_{\mathbf {u}^k} {\, \mathbb {S}(\mathbb {R}^{n_k})}\) and \((\mathbf {v}^k)^\perp := \mathrm {T}_{\mathbf {v}^k} {\, \mathbb {S}(\mathbb {R}^{n_k})}\), respectively.
The columns of M are rotated into
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ43_HTML.png
(43)
while the columns of \(L_1\) and \(L_2\) are rotated as follows. We define the \(\varPi \times (\varSigma -1)\)-matrices \(R_\downarrow := \begin{bmatrix} R_\downarrow ^1&\ldots&R_\downarrow ^d\end{bmatrix}\) and \(R_\uparrow := \begin{bmatrix} R_\uparrow ^1&\ldots&R_\uparrow ^d\end{bmatrix}\), where
$$\begin{aligned} R_\downarrow ^k = \frac{1}{\sqrt{2}}(L_1^k - L_2^k) \quad \text {and} \quad R_\uparrow ^k = \frac{1}{\sqrt{2}}(L_1^k + L_2^k). \end{aligned}$$
(44)
The reason for using \(\downarrow \) and \(\uparrow \) will become clear from the computations below: Inner products of two quantities with arrows pointing in opposite directions are zero, and swapping the directions of both arrows flips a sign in the expression of the inner product.
Now, instead of considering the matrix U we work with the matrix
$$\begin{aligned} N:= \begin{bmatrix} \mathbf {a}_\downarrow&R_\downarrow&\mathbf {a}_\uparrow&R_\uparrow \end{bmatrix}. \end{aligned}$$
By construction, \(\begin{bmatrix} M&L_1&L_2\end{bmatrix} = N Q\) for some orthogonal matrix Q, so that
$$\begin{aligned} q(U) = q(\begin{bmatrix} \mathbf {a}_\downarrow&R_\downarrow&\mathbf {a}_\uparrow&R_\uparrow \end{bmatrix}) = q(N) \quad \text { and }\quad q(U)^2 = q(N^T N). \end{aligned}$$
Note that the choice of \(\dot{U}^k\) and \(\dot{V}^k\) does not affect the value of q. In particular, we may choose the matrices as follows. Let H be the plane in \(\mathbb {R}^{n_k}\) spanned by \(\mathbf {u}^k\) and \(\mathbf {v}^k\). By definition of \(D(\epsilon )\), \(\mathbf {u}^k \ne \pm \mathbf {v}^k\). Let O be the rotation that sends \(\mathbf {u}^k\) to \(\mathbf {v}^k\), but leaves the orthogonal complement \(H^\perp \) of H fixed. Take \(\dot{\mathbf{u }}_{2}^{k} \in H\) with \(\dot{\mathbf {u}}_2^k \perp \mathbf {u}^k\) as the unit norm vector making the smallest angle with \(\mathbf {v}^k\), as follows:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ215_HTML.png
We also take \(\dot{\mathbf{v }}_{2}^{k} := O \dot{\mathbf{u }}_{2}^{k}\), as in the illustration above. If \(\mathbf {h}_3,\ldots , \mathbf {h}_{n_k}\) is any orthogonal basis of \(H^\perp \), then our choice of bases is
$$\begin{aligned} \dot{\mathbf{u }}_{2}^{k} \text { and } \dot{\mathbf{v }}_{2}^{k} = O \dot{\mathbf{u }}_{2}^{k} \text { as above, and for } 3\le j \le n_k: \dot{\mathbf{u }}_{j}^{k} = \dot{\mathbf{v }}_{j}^{k} = \mathbf {h}_j. \end{aligned}$$
(45)
In other words, we can assume that all but the first columns of \(\dot{U}^k\) and \(\dot{V}^k\) are equal. Moreover, as can be seen from the foregoing figure, the following properties hold:
$$\begin{aligned} \langle \dot{\mathbf{u }}_{2}^{k}, \dot{\mathbf{v }}_{2}^{k} \rangle&= \langle Q \mathbf {u}^k, Q \mathbf {v}^k \rangle = \langle \mathbf {u}^k, \mathbf {v}^k\rangle = \delta _k, \\ \langle \dot{\mathbf{u }}_{2}^{k}, \mathbf {v}^k \rangle&= \cos \left( \frac{\pi }{2} - \arccos ( \delta _k ) \right) = \sqrt{1 - \delta _k^2} = \delta _k \epsilon _k,\nonumber \\ \langle \dot{\mathbf{v }}_{2}^{k}, \mathbf {u}^k \rangle&= \cos \left( \frac{\pi }{2} + \arccos ( \delta _k ) \right) = -\sqrt{1 - \delta _k^2} = -\delta _k \epsilon _k,\nonumber \end{aligned}$$
(46)
where Q is a rotation by \(\frac{\pi }{2}\) radians in same direction as the rotation O. In particular, we have
$$\begin{aligned} (\dot{U}^k)^T \mathbf {v}^k = \delta _k \epsilon _k \mathbf {e}^k \text { and } (\dot{V}^k)^T \mathbf {u}^k = -\delta _k \epsilon _k \mathbf {e}^k, \quad \text {where}\quad \mathbf {e}^k := (1, 0, \ldots , 0)^T\in \mathbb {R}^{n_k-1}. \end{aligned}$$
Consider then the Gram matrix \(G = N^T N\) for this particular choice of tangent vectors:
$$\begin{aligned} G = \begin{bmatrix} \mathbf {a}_\downarrow ^T\mathbf {a}_\downarrow &{} \mathbf {a}_\downarrow ^TR_\downarrow &{} \mathbf {a}_\downarrow ^T\mathbf {a}_\uparrow &{} \mathbf {a}_\downarrow ^TR_\uparrow \\ R_\downarrow ^T\mathbf {a}_\downarrow &{} R_\downarrow ^TR_\downarrow &{} R_\downarrow ^T\mathbf {a}_\uparrow &{} R_\downarrow ^TR_\uparrow \\ \mathbf {a}_\uparrow ^T\mathbf {a}_\downarrow &{} \mathbf {a}_\uparrow ^TR_\downarrow &{} \mathbf {a}_\uparrow ^T\mathbf {a}_\uparrow &{} \mathbf {a}_\uparrow ^TR_\uparrow \\ R_\uparrow ^T\mathbf {a}_\downarrow &{} R_\uparrow ^TR_\downarrow &{} R_\uparrow ^T\mathbf {a}_\uparrow &{} R_\uparrow ^TR_\uparrow . \end{bmatrix}. \end{aligned}$$
We continue by computing its entries.

Inner Products Involving Only \(\mathbf {a}\)

Using Eq. (10) for computing inner products of rank-1 tensors, we see that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ47_HTML.png
(47)

Inner Products Involving both \(\mathbf {a}\) and R

For each k, we have https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq865_HTML.gif ,
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ216_HTML.png
This implies
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ217_HTML.png
having used Eqs. (46) and (10). Combining the above, we obtain
$$\begin{aligned} R_\uparrow ^T\mathbf {a}_\downarrow = R_\downarrow ^T\mathbf {a}_\uparrow =0,\quad R_\downarrow ^T\mathbf {a}_\downarrow = z\mathbf {f}, \quad \text {and} \quad R_\uparrow ^T\mathbf {a}_\uparrow = -z \mathbf {f}, \text { where } \mathbf {f}=\begin{bmatrix}\epsilon _1\mathbf {e}^1\\ \vdots \\ \epsilon _d\mathbf {e}^{d}\end{bmatrix}. \end{aligned}$$
(48)

Inner Products Involving only R

By construction, we have \((\dot{U}^k)^T \dot{U}^k = (\dot{V}^k)^T \dot{V}^k =\mathrm {I}_{n_k-1} \), where \(\mathrm {I}_{n_k-1}\) is the \((n_k-1)\times (n_k-1)\) identity matrix. Furthermore, by our choice of tangent vectors from Eqs. (45) and (46), we have \((\dot{U}^k)^T \dot{V}^k = \mathrm {diag}(\langle \dot{\mathbf{u }}_{j}^{k},\dot{\mathbf{v }}_{j}^{k} \rangle )_{j=2}^{n_k} = \mathrm {diag}(\delta _k, 1,\ldots , 1).\) This implies
$$\begin{aligned} (L_1^k)^T L_1^k = (L_2^k)^T L_2^k = \mathrm {I}_{n_k -1}, \quad (L_1^k)^T L_2^k = \big (\prod _{i\ne k}\delta _i\big ) \, (\dot{U}^k)^T \dot{V}^k = z \cdot \mathrm {diag}(1, \delta _k^{-1}, \ldots , \delta _k^{-1}). \end{aligned}$$
Moreover, for \(j \ne k\) we have
$$\begin{aligned} (L_1^j)^T L_1^k&= (L_2^j)^T L_2^k = 0, \\ (L_1^j)^T L_2^k&= \Bigl (\prod _{i\not \in \{k,j\}} \delta _i\Bigr ) \cdot {\left\{ \begin{array}{ll} \bigl ((\dot{U}^j)^T \mathbf {v}^j \bigr ) \otimes \bigl ( (\mathbf {u}^k)^T \dot{V}^k \bigr ), &{} j < k, \\ \bigl ( (\mathbf {u}^k)^T \dot{V}^k \bigr ) \otimes \bigl ( (\dot{U}^j)^T \mathbf {v}^j \bigr ), &{} j > k \end{array}\right. } = -z \epsilon _j \epsilon _k \mathbf {e}^j (\mathbf {e}^k)^T. \end{aligned}$$
From this, we get
$$\begin{aligned} (R_\downarrow ^j)^T R_\downarrow ^k&= \frac{1}{2}(L_1^j - L_2^j)^T (L_1^k - L_2^k) = {\left\{ \begin{array}{ll} \mathrm {I}_{n_k -1} -z \cdot \mathrm {diag}(1, \delta _k^{-1}, \ldots , \delta _k^{-1}),&{} j=k\\ z \epsilon _j \epsilon _k \,\mathbf {e}^j (\mathbf {e}^k)^T ,&{} j\ne k\end{array}\right. },\\ (R_\downarrow ^j)^T R_\uparrow ^k&= \frac{1}{2}(L_1^j - L_2^j)^T (L_1^k + L_2^k) = 0, \\ (R_\uparrow ^j)^T R_\uparrow ^k&= \frac{1}{2}(L_1^j + L_2^j)^T (L_1^k + L_2^k) = {\left\{ \begin{array}{ll} \mathrm {I}_{n_k -1} + z \cdot \mathrm {diag}(1, \delta _k^{-1}, \ldots , \delta _k^{-1}),&{} j=k\\ -z \epsilon _j \epsilon _k \, \mathbf {e}^j (\mathbf {e}^k)^T,&{} j\ne k\end{array}\right. }. \end{aligned}$$
Note that
$$\begin{aligned} \mathrm {I}_{n_k-1} - z \cdot \mathrm {diag}(1, \delta _k^{-1}, \ldots , \delta _k^{-1}) = \mathrm {I}_{n_k-1} - z \cdot \mathrm {diag}(1+\epsilon _k^2, \delta _k^{-1}, \ldots , \delta _k^{-1}) + z \epsilon _k^2 \mathbf {e}^k(\mathbf {e}^k)^T, \end{aligned}$$
and
$$\begin{aligned} \mathrm {I}_{n_k-1} + z \cdot \mathrm {diag}(1, \delta _k^{-1}, \ldots , \delta _k^{-1}) = \mathrm {I}_{n_k-1} + z \cdot \mathrm {diag}(1+\epsilon _k^2, \delta _k^{-1}, \ldots , \delta _k^{-1}) - z \epsilon _k^2 \mathbf {e}^k(\mathbf {e}^k)^T. \end{aligned}$$
Exploiting the definition of the vector \(\mathbf {f}\) in Eq. (48), the foregoing can be expressed concisely as
$$\begin{aligned} R_\downarrow ^T R_\downarrow&= D_\downarrow + z\mathbf {f}\mathbf {f}^T, \quad R_\downarrow ^T R_\uparrow =0, \quad \text {and}\quad R_\uparrow ^T R_\uparrow = D_\uparrow - z\mathbf {f}\mathbf {f}^T, \end{aligned}$$
where we introduced
$$\begin{aligned} D_\downarrow&= \mathrm {I}_{\varSigma -1} - z \cdot \mathrm {diag}\left( 1+\epsilon _1^2, \underbrace{\delta _1^{-1}, \ldots , \delta _1^{-1}}_{(n_1-2) \text {-times}}\,, \ldots , 1 + \epsilon _d^2, \underbrace{\delta _d^{-1}, \ldots , \delta _d^{-1}}_{(n_d-2) \text {-times}}\right) , \text { and} \\ D_\uparrow&= \mathrm {I}_{\varSigma -1} + z \cdot \mathrm {diag}\left( 1+\epsilon _1^2, \underbrace{\delta _1^{-1}, \ldots , \delta _1^{-1}}_{(n_1-2) \text {-times}}\,, \ldots , 1 + \epsilon _d^2, \underbrace{\delta _d^{-1}, \ldots , \delta _d^{-1}}_{(n_d-2) \text {-times}}\right) . \end{aligned}$$

Putting Everything Together

From the definition of the vector \(\mathbf {f}\) in Eq. (48), it is clear that we can construct a permutation matrix P that moves the nonzero elements of \(\mathbf {f}\) to the first d positions:
$$\begin{aligned} P \mathbf {f} = \begin{bmatrix} \mathbf {g} \\ 0 \end{bmatrix}, \text { where } \mathbf {g} = \begin{bmatrix} \epsilon _1&\cdots&\epsilon _d \end{bmatrix}^T \end{aligned}$$
and 0 is a vector of zeros of length \(\varSigma -1-d\). Applying \(P^T\) on the right of the R’s yields
$$\begin{aligned} R_\downarrow P^T := \begin{bmatrix} T_\downarrow&S_\downarrow \end{bmatrix} \quad \text {and}\quad R_\uparrow P^T := \begin{bmatrix} T_\uparrow&S_\uparrow \end{bmatrix}, \end{aligned}$$
(49)
where the T matrices are, respectively, given by
$$\begin{aligned} \sqrt{2} T_\downarrow = [ \mathbf {u}^1 \otimes \cdots \otimes \mathbf {u}^{k-1} \otimes \dot{\mathbf{u }}_{2}^{k} \otimes \mathbf {u}^{k+1} \otimes \cdots&\otimes \mathbf {u}^d - \mathbf {v}^1 \otimes \\&\cdots \otimes \mathbf {v}^{k-1} \otimes \dot{\mathbf{v }}_{2}^{k} \otimes \mathbf {v}^{k+1} \otimes \cdots \otimes \mathbf {v}^d ]_{k=1}^d, \end{aligned}$$
and analogously for \(T_\uparrow \) replacing the subtraction by an addition; the matrices \(S_\downarrow \) and \(S_\uparrow \) contain the remainder of the columns of \(R_\downarrow \) and \(R_\uparrow \), respectively. Then, we have
$$\begin{aligned} P (D_\downarrow + z \mathbf {f} \mathbf {f}^T) P^T&= \begin{bmatrix} E_\downarrow &{} 0 \\ 0 &{} F_\downarrow \end{bmatrix} + z \begin{bmatrix} \mathbf {g} \\ 0 \end{bmatrix} \begin{bmatrix} \mathbf {g}^T&0^T \end{bmatrix} = \begin{bmatrix} E_\downarrow + z \mathbf {g} \mathbf {g}^T &{} 0 \\ 0 &{} F_\downarrow \end{bmatrix} = \begin{bmatrix} T_\downarrow ^T T_\downarrow &{} T_\downarrow ^T S_\downarrow \\ S_\downarrow ^T T_\downarrow &{} S_\downarrow ^T S_\downarrow \end{bmatrix}, \text { and} \\ P (D_\uparrow - z \mathbf {f} \mathbf {f}^T) P^T&= \begin{bmatrix} E_\uparrow &{} 0 \\ 0 &{} F_\uparrow \end{bmatrix} - z \begin{bmatrix} \mathbf {g} \\ 0 \end{bmatrix} \begin{bmatrix} \mathbf {g}^T&0^T \end{bmatrix} = \begin{bmatrix} E_\uparrow - z \mathbf {g}\mathbf {g}^T &{} 0 \\ 0 &{} F_\uparrow \end{bmatrix} = \begin{bmatrix} T_\uparrow ^T T_\uparrow &{} T_\uparrow ^T S_\uparrow \\ S_\uparrow ^T T_\uparrow &{} S_\uparrow ^T S_\uparrow \end{bmatrix}, \end{aligned}$$
where
$$\begin{aligned} E_\downarrow&:= \mathrm {I}_d - z \cdot \mathrm {diag}( 1 + \epsilon _1^2, 1 + \epsilon _2^2, \ldots , 1 + \epsilon _d^2 ), \nonumber \\ E_\uparrow&:= \mathrm {I}_d + z \cdot \mathrm {diag}( 1 + \epsilon _1^2, 1 + \epsilon _2^2, \ldots , 1 + \epsilon _d^2 ), \nonumber \\ F_\downarrow&:= \mathrm {I}_{\varSigma -1-d} - z \cdot \mathrm {diag}( \underbrace{\delta _1^{-1}, \ldots , \delta _1^{-1}}_{(n_1-2) \text {-times}}\,, \ldots , \underbrace{\delta _d^{-1}, \ldots , \delta _d^{-1}}_{(n_d-2) \text {-times}} ), \text { and} \nonumber \\ F_\uparrow&:= \mathrm {I}_{\varSigma -1-d} + z \cdot \mathrm {diag}( \underbrace{\delta _1^{-1}, \ldots , \delta _1^{-1}}_{(n_1-2) \text {-times}}\,, \ldots , \underbrace{\delta _d^{-1}, \ldots , \delta _d^{-1}}_{(n_d-2) \text {-times}} ). \end{aligned}$$
(50)
Hence, by swapping rows and columns of G, which leaves the value of q unchanged because they are orthogonal operations, we find that \(q(G) = q(G')\) with
$$\begin{aligned} G'&:= \begin{bmatrix} \mathbf {a}_\downarrow ^T \mathbf {a}_\downarrow &{} \mathbf {a}_\downarrow ^T T_\downarrow &{} \mathbf {a}_\downarrow ^T S_\downarrow &{} \mathbf {a}_\downarrow ^T\mathbf {a}_\uparrow &{} \mathbf {a}_\downarrow ^T T_\uparrow &{} \mathbf {a}_\downarrow ^T S_\uparrow \\ T_\downarrow ^T \mathbf {a}_\downarrow &{} T_\downarrow ^T T_\downarrow &{} T_\downarrow ^T S_\downarrow &{} T_\downarrow ^T\mathbf {a}_\uparrow &{} T_\downarrow ^T T_\uparrow &{} T_\downarrow ^T S_\uparrow \\ S_\downarrow ^T \mathbf {a}_\downarrow &{} S_\downarrow ^T T_\downarrow &{} S_\downarrow ^T S_\downarrow &{} S_\downarrow ^T\mathbf {a}_\uparrow &{} S_\downarrow ^T T_\uparrow &{} S_\downarrow ^T S_\uparrow \\ \mathbf {a}_\uparrow ^T \mathbf {a}_\downarrow &{} \mathbf {a}_\uparrow ^T T_\downarrow &{} \mathbf {a}_\uparrow ^T S_\downarrow &{} \mathbf {a}_\uparrow ^T\mathbf {a}_\uparrow &{} \mathbf {a}_\uparrow ^T T_\uparrow &{} \mathbf {a}_\uparrow ^T S_\uparrow \\ T_\uparrow ^T \mathbf {a}_\downarrow &{} T_\uparrow ^T T_\downarrow &{} T_\uparrow ^T S_\downarrow &{} T_\uparrow ^T\mathbf {a}_\uparrow &{} T_\uparrow ^T T_\uparrow &{} T_\uparrow ^T S_\uparrow \\ S_\uparrow ^T \mathbf {a}_\downarrow &{} S_\uparrow ^T T_\downarrow &{} S_\uparrow ^T S_\downarrow &{} S_\uparrow ^T\mathbf {a}_\uparrow &{} S_\uparrow ^T T_\uparrow &{} S_\uparrow ^T S_\uparrow \end{bmatrix} \nonumber \\&= \begin{bmatrix} 1 + z &{} z \mathbf {g}^T &{} 0 &{} 0 &{} 0 &{} 0 \\ z \mathbf {g} &{} E_\downarrow + z \mathbf {g} \mathbf {g}^T &{} 0 &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} F_\downarrow &{} 0 &{} 0 &{} 0 \\ 0 &{} 0 &{} 0 &{} 1 -z &{} -z \mathbf {g}^T &{} 0 \\ 0 &{} 0 &{} 0 &{} -z \mathbf {g} &{} E_\uparrow - z \mathbf {g} \mathbf {g}^T &{} 0 \\ 0 &{} 0 &{} 0 &{} 0 &{} 0 &{} F_\uparrow \end{bmatrix}. \end{aligned}$$
(51)

Bounding q(G)

To simplify more, we write
$$\begin{aligned} G_\downarrow := \begin{bmatrix} 1 &{} 0 \\ 0 &{} E_\downarrow \end{bmatrix},\quad G_\uparrow := \begin{bmatrix} 1 &{} 0 \\ 0 &{} E_\uparrow \end{bmatrix},\quad \text { and }\quad \mathbf {h} := \begin{bmatrix} 1 \\ \mathbf {g}\end{bmatrix}, \end{aligned}$$
so that
$$\begin{aligned} q(G) = q(G') = q\left( \begin{bmatrix} G_\downarrow + z\mathbf {h}\mathbf {h}^T &{} 0 &{} 0 &{} 0 \\ 0 &{} G_\uparrow - z\mathbf {h}\mathbf {h}^T &{} 0 &{} 0\\ 0 &{} 0 &{} F_\downarrow &{} 0 \\ 0 &{} 0 &{} 0 &{} F_\uparrow \end{bmatrix}\right) , \end{aligned}$$
where we swapped some rows and columns again. Because the smallest singular value of the matrix \(G_\uparrow - z\mathbf {h}\mathbf {h}^T\) is larger than or equal to the smallest singular value of the positive semidefinite matrix \(G'\), we obtain the bound
$$\begin{aligned} q(G) \ge q( G_\uparrow - z\mathbf {h}\mathbf {h}^T ) \det ( G_\downarrow + z\mathbf {h}\mathbf {h}^T ) \det (F_\downarrow ) \det (F_\uparrow ). \end{aligned}$$
(52)
We now obtain bounds on the individual factors on the right-hand side of Eq. (52). First, we compute the determinants of the diagonal matrices:
$$\begin{aligned} \det (F_\downarrow ) \det (F_\uparrow ) = \prod _{i=1}^d \bigl ( (1-z \delta _i^{-1}) (1 + z \delta _i^{-1}) \bigr )^{n_i-2} \ge \prod _{i=1}^d ( 1-z \delta _i^{-1} )^{n_i-2}, \end{aligned}$$
having used \(0 < z = \delta _1 \cdots \delta _d \le 1\). Next, we see that
$$\begin{aligned} 1 - z \delta _i^{-1} \ge 1 - \delta _{\max }^{d-1} \ge 1 - \delta _{\max } > \left( \frac{9}{10}\right) ^2(1 - \delta _1), \text { where } \delta _{\max } := \max \{\delta _1, \ldots , \delta _d \}. \end{aligned}$$
Note that we used \(2 \left( \frac{9}{10}\right) ^2 (1-\delta _1) = \left( \frac{9}{10}\right) ^2 \Vert \mathbf {u}^1 - \mathbf {v}^1 \Vert ^2 < \Vert \mathbf {u}^k - \mathbf {v}^k \Vert ^2 = 2 (1-\delta _k)\) in the last inequality. As a result, we obtain
$$\begin{aligned} \det (F_\downarrow ) \det (F_\uparrow ) > \left( \frac{81}{100}\right) ^{\varSigma -d-1} (1-\delta _1)^{\varSigma -d-1} \ge \left( \frac{1-\delta _1}{2} \right) ^{\varSigma -d-1}. \end{aligned}$$
(53)
The final determinant in Eq. (52) can be computed as follows. Note that \(z \mathbf {h} \mathbf {h}^T\) is a symmetric matrix with one positive eigenvalue and all others zero. Hence, it follows from Weyl’s inequalities [51, Theorem 4.3.7] that the eigenvalues of \(G_\downarrow \) cannot decrease by adding \(z \mathbf {h} \mathbf {h}^T\). Hence,
$$\begin{aligned} \det ( G_\downarrow + z\mathbf {h}\mathbf {h}^T ) \ge \det (G_\downarrow ) = \det (E_\downarrow ) = \prod _{i=1}^d (1 - z(1+\epsilon _i^2)). \end{aligned}$$
Next, we bound \(1 - z(1+\epsilon _i^2)\) from below and above. From (40) and (41), we have for some universal constant \(C>0\):
$$\begin{aligned} 1-z(1+\epsilon _i^2)&\le 1-(1-C\epsilon _1^2)^{d+1}\\&=(1-(1-C\epsilon _1^2))(1+(1-C\epsilon _1^2)+\cdots +(1-C\epsilon _1^2)^d)\nonumber \\&\le C(d+1)\epsilon _1^2.\nonumber \end{aligned}$$
(54)
For obtaining the lower bound, note that
$$\begin{aligned} z&= \delta _1\cdots \delta _d \le \sqrt{\frac{1}{(1+\epsilon _1^2)(1+\epsilon _2^2)(1+\epsilon _3^2)}} \le \sqrt{\frac{1}{1+\epsilon _1^2} \frac{1}{(1+(9/10)^4 \epsilon _1^2)^2}} \\&= \frac{1}{1+(9/10)^4 \epsilon _1^2} \sqrt{\frac{1}{1+\epsilon _1^2}}, \end{aligned}$$
so that
$$\begin{aligned} 1-z(1+\epsilon _i^2)\ge 1-\frac{\sqrt{1+\epsilon _1^2}}{1+(9/10)^4\epsilon _1^2}\ge \frac{\epsilon _1^2}{8}. \end{aligned}$$
(55)
By Eq. (40), \(\epsilon _1^2 \ge 2(1 - \delta _1)\), so that
$$\begin{aligned} \det ( G_\downarrow + z\mathbf {h}\mathbf {h}^T ) \ge \left( \frac{2}{8}\right) ^d (1 - \delta _1)^d = 2^{-d} \left( \frac{1 - \delta _1}{2}\right) ^d. \end{aligned}$$
(56)
The proof can be completed by a fortunate application of Geršgorin’s theorem [43, Satz III]. According to this theorem, the eigenvalues of
$$\begin{aligned} G^\uparrow - z\mathbf {h}\mathbf {h}^T = \begin{bmatrix} 1 - z &{} -z\epsilon _1 &{} -z\epsilon _2 &{} \cdots &{} -z\epsilon _d \\ -z\epsilon _1 &{} 1 + z &{} -z\epsilon _1 \epsilon _2 &{} \cdots &{} -z\epsilon _1 \epsilon _d \\ -z\epsilon _2 &{} -z\epsilon _1 \epsilon _2 &{} \ddots &{} &{} \vdots \\ \vdots &{} \vdots &{} &{} 1+z &{} -z\epsilon _{d-1}\epsilon _d \\ -z\epsilon _d &{} -z\epsilon _1 \epsilon _d &{} \cdots &{} -z\epsilon _{d-1}\epsilon _d &{} 1+z \end{bmatrix} \end{aligned}$$
are contained in the following Geršgorin disks
$$\begin{aligned} \mathrm {disk}_0&:= \Bigl \{ x \in \mathbb {C}\;\;|\;\; | (1-z) - x | \le z \sum _{k=1}^d \epsilon _k \Bigr \}, \text { and}\\ \mathrm {disk}_i&:= \Bigl \{ x \in \mathbb {C}\;\;|\;\; | (1+z) - x | \le z\epsilon _k + z\epsilon _k \sum _{j\ne k} \epsilon _j\Bigr \}, \quad i=1, \ldots , d. \end{aligned}$$
Recall from Eq. (42) that \(z = \delta _1\cdots \delta _d\) and from Eq. (39) that \(\delta _k = \sqrt{\tfrac{1}{1+\epsilon _k^2}}\). By choosing \(\epsilon = \epsilon _1\) small enough, we can get z as close to 1 and \(\epsilon _k \le \epsilon _1\) as close to 0 as we want. Therefore, if we choose \(\epsilon \) small enough, \(\mathrm {disk}_0\) is a small disk near zero, and \(\mathrm {disk}_k\) are small, pairwise overlapping disks near two. When \(\epsilon \) is sufficiently small, \(\mathrm {disk}_0\) is disjoint from the other disks, so that \(\mathrm {disk}_0\) contains exactly 1 eigenvalue close to 0, and \(\bigcup _{i=1}^d \mathrm {disk}_i\) contains d eigenvalues close to 2. Furthermore, since we are dealing with symmetric matrices, all the eigenvalues are real. Therefore, for sufficiently small \(\epsilon \),
$$\begin{aligned} q(G_\uparrow - z \mathbf {h}\mathbf {h}^T) \ge (1 + z - d \epsilon _1)^d \ge 1. \end{aligned}$$
(57)
Finally, plugging Eqs. (53)–(57) into Eq. (52), we find
$$\begin{aligned} q(G) \ge 2^{-d} \left( \frac{1-\delta _1}{2} \right) ^{\varSigma -1}. \end{aligned}$$
This finishes the proof.

Proof of Lemma 9

For the proof of Lemma 9, we need the following two simple results. The first one is a well-known result.
Lemma 16
For \(1\le k\le d\), let \(\mathcal {A}_k \subset \mathbb {R}^{n_k}\) be a linear subspace of dimension \(m_k\) and let \(\mathcal {A}_k^\perp \) be its orthogonal complement. Put \(\mathcal {V} := \mathcal {A}_1 \otimes \cdots \otimes \mathcal {A}_d\) and \(\mathcal {W} := \mathcal {A}_1^\perp \otimes \cdots \otimes \mathcal {A}_d^\perp \) and let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq910_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq911_HTML.gif . Then, the tangent spaces https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq912_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq913_HTML.gif are orthogonal to each other.
Proof
Let us write https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq914_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq915_HTML.gif . By Eq. (9), all vectors of the form \(\mathbf {a}^1\otimes \cdots \otimes \mathbf {a}^{k-1}\otimes \mathbf {v}\otimes \mathbf {a}^{k+1}\otimes \cdots \otimes \mathbf {a}^d\) for \(1\le k\le d\) and where \(\mathbf {v}\in \mathbb {R}^{n_k}\) constitute a generating set of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq919_HTML.gif . A similar statement holds for https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq920_HTML.gif . Then, by Eq. (10), the inner product between two such generators \(\mathbf {t}:=\mathbf {a}^1\otimes \cdots \otimes \mathbf {a}^{k-1}\otimes \mathbf {v}\otimes \mathbf {a}^{k+1}\otimes \cdots \otimes \mathbf {a}^d\) and \(\mathbf {s}:=\mathbf {b}^1\otimes \cdots \otimes \mathbf {b}^{\ell -1}\otimes \mathbf {w}\otimes \mathbf {b}^{\ell +1}\otimes \cdots \otimes \mathbf {b}^d\) is
$$\begin{aligned} \langle \mathbf {t}, \mathbf {s}\rangle = {\left\{ \begin{array}{ll} \langle \mathbf {v}, \mathbf {w}\rangle \, \prod _{j\ne k} \langle \mathbf {a}^j, \mathbf {b}^j\rangle , &{} \text { if } k = \ell \\ \langle \mathbf {a}^\ell , \mathbf {w}\rangle \,\langle \mathbf {v}, \mathbf {b}^k\rangle \, \prod _{j\ne k,\ell } \langle \mathbf {a}^j, \mathbf {b}^j\rangle , &{} \text { if } k \ne \ell \end{array}\right. } \;= 0, \end{aligned}$$
because \(d\ge 3\) and \(\langle \mathbf {a}^j, \mathbf {b}^j \rangle =0\) for all \(1\le j\le d\). This completes the proof. \(\square \)
Lemma 17
For all \(1\le k\le d\), let \(\mathcal {A}_k\subset \mathbb {R}^{n_k}\) be a linear subspace of dimension \(m_i\), and define the tensor space \(\mathcal {V} := \mathcal {A}_1\otimes \cdots \otimes \mathcal {A}_d\). Then, the following holds.
1.
\(\mathcal {S}_{n_1,\ldots ,n_d} \cap \mathcal {V}\) is a manifold.
 
2.
For \(1\le k\le d\), let \(U_k\in \mathbb {R}^{n_k\times m_k}\) be a matrix whose columns form an orthonormal basis for \(\mathcal {A}_k\). The map
$$\begin{aligned} \mathcal {S}_{m_1,\ldots ,m_d} \rightarrow \mathcal {S}_{n_1,\ldots ,n_d} \cap \mathcal {V},\; \mathbf {a}^1\otimes \cdots \otimes \mathbf {a}^d \mapsto (U_1 \mathbf {a}^1)\otimes \cdots \otimes (U_d \mathbf {a}^d) \end{aligned}$$
is an isometry.
 
Proof
Let us write \(U_1\otimes \cdots \otimes U_d\) for the map from (2). It extends to a linear map \(\mathbb {R}^{m_1}\otimes \cdots \otimes \mathbb {R}^{m_d} \rightarrow \mathbb {R}^{n_1}\otimes \cdots \otimes \mathbb {R}^{n_d}\) because of the universal property [44, Chapter 1]. By definition, we have \((U_1\otimes \cdots \otimes U_d)(\mathcal {S}_{m_1,\ldots ,m_d}) = \mathcal {S}_{n_1,\ldots ,n_d} \cap \mathcal {V}\) and \((U_1^T\otimes \cdots \otimes U_d^T)(\mathcal {S}_{n_1,\ldots ,n_d} \cap \mathcal {V}) = \mathcal {S}_{m_1,\ldots ,m_d}\). Hence, \(\mathcal {S}_{n_1,\ldots ,n_d} \cap \mathcal {V}\) is the image of a manifold under an invertible linear map. This implies that \(\mathcal {S}_{n_1,\ldots ,n_d} \cap \mathcal {V}\) itself is a manifold. Furthermore, U preserves the Euclidean inner product, and so the manifolds \(\mathcal {S}_{n_1,\ldots ,n_d} \cap \mathcal {V}\) and \(\mathcal {S}_{m_1,\ldots ,m_d}\) are isometric. \(\square \)
We are now ready to prove Lemma 9.
Proof of Lemma 9
We assumed that \(\sigma _{r-2; n_1-2,\ldots ,n_d-2}\subset \mathbb {R}^{n_1-2}\otimes \cdots \otimes \mathbb {R}^{n_d-2}\) is generically complex identifiable. Then, Proposition 1(2) tells us that \(\mathcal {M}_{r-2;n_1-2,\ldots ,n_d-2}\) is Zariski-open in \((\mathcal {S}_{n_1-2,\ldots ,n_d-2})^{\times (r-2)}\). Fix any tuple https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq947_HTML.gif . By definition of \(\mathcal {M}_{r-2;n_1-2,\ldots ,n_d-2}\), the least singular value of the derivative https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq949_HTML.gif of the addition map \(\varPhi : (\mathcal {S}_{n_1-2,\ldots ,n_d-2})^{\times (r-2)}\rightarrow \mathcal {N}_{r-2;n_1-2,\ldots ,n_d-2}\) is positive. This implies that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq951_HTML.gif Hence, if \(X_i\) is a matrix whose columns form an orthonormal basis for https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq953_HTML.gif , we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ218_HTML.png
Moreover, if we write https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq954_HTML.gif with \(\Vert \mathbf {x}_i^k\Vert =1\) and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq956_HTML.gif , then we have
$$\begin{aligned} \mathrm {vol}\begin{bmatrix} \mathbf {x}_1^{h_1} \otimes \cdots \otimes \mathbf {x}_1^{h_{d-1}}&\cdots&\mathbf {x}_{r-2}^{h_1} \otimes \cdots \otimes \mathbf {x}_{r-2}^{h_{d-1}} \end{bmatrix}>0 \end{aligned}$$
for all subsets \(\{h_1,\ldots ,h_{d-1}\}\subset \{1,\ldots ,d\}\) of cardinality \(d-1\). Indeed, suppose to the contrary that the foregoing matrix would have linearly dependent columns for some subset; w.l.o.g., we can assume \(h_i = i\) for \(i=1,\ldots ,d-1\). Then, there are nonzero \(\lambda _i\) such that \(\sum _{i=1}^{r-2} \lambda _i \mathbf {x}_{i}^{1}\otimes \cdots \otimes \mathbf {x}_{i}^{d-1} = 0\). Tensoring with an arbitrary vector \(\mathbf {v}\in \mathbb {R}^{n_d-2}\) yields \(\sum _{i=1}^{r-2} \lambda _i \mathbf {x}_{i}^{1}\otimes \cdots \otimes \mathbf {x}_{i}^{d-1} \otimes \mathbf {v} = 0 \otimes \mathbf {v} = 0\), having exploited multilinearity. Note that the ith term in the last sum lives in https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq965_HTML.gif and can be obtained as a linear combination of the columns of the i-th block of Terracini’s matrix, so that Eq. (12) implies that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq966_HTML.gif , which contradicts https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq967_HTML.gif . We define the following constant, which will play an important role in this proof:
$$\begin{aligned} \mu := \frac{\mathrm {vol}\left( \begin{bmatrix} X_1&\cdots&X_{r-2}\end{bmatrix}\right) }{2}\prod _{1\le h_1<\cdots <h_{d-1}\le d} \mathrm {vol}\begin{bmatrix} \mathbf {x}_1^{h_1} \otimes \cdots \otimes \mathbf {x}_1^{h_{d-1}}&\cdots&\mathbf {x}_{r-2}^{h_1} \otimes \cdots \otimes \mathbf {x}_{r-2}^{h_{d-1}} \end{bmatrix}. \end{aligned}$$
It is important to note that \(\mu > 0\) is only defined through the choice of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq969_HTML.gif . The key observation is that we can choose \(\mu \) independently of what follows.
Recall that the restriction of \(\phi \) to \(\mathcal {N}_{2;n_1,\ldots ,n_d} \times (\mathcal {S}_{n_1,\ldots ,n_d})^{\times (r-2)}\) is
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ219_HTML.png
In the terminology of Breiding and Vannieuwenhoven [20], \(\mathcal {N}_r^*\) is the join of \(\mathcal {N}_{2;n_1,\ldots ,n_d}\) and \(r-2\) copies of \(\mathcal {S}_{n_1,\ldots ,n_d}\).
Let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq977_HTML.gif . By construction, the tensor https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq978_HTML.gif has multilinear rank bounded by \((2,\ldots ,2)\), meaning that there exists a tensor subspace \(\mathcal {V} = \mathcal {A}_1 \otimes \cdots \otimes \mathcal {A}_d \subset \mathbb {R}^{n_1} \otimes \cdots \otimes \mathbb {R}^{n_d}\) where the linear subspace \(\mathcal {A}_k \subset \mathbb {R}^{n_k}\) has \(\dim \mathcal {A}_k = 2\) and such that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq983_HTML.gif . For each \(1\le k\le d\), we denote the orthogonal complement of \(\mathcal {A}_k\) in \(\mathbb {R}^{n_k}\) by \(\mathcal {A}_k^\perp \). Let us define \(\mathcal {W} := \mathcal {A}_1^\perp \otimes \cdots \otimes \mathcal {A}_d^\perp \).
Now, we make the following choice of rank-1 tensors https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq989_HTML.gif : for \(1\le k \le d\) let \(U_k \in \mathbb {R}^{\varPi \times n_k-2}\) be a matrix whose columns form an orthonormal basis of \(\mathcal {A}_k^\perp \). Then, by Lemma 17, the map
$$\begin{aligned} U: \mathcal {S}_{n_1-2,\ldots ,n_d-2} \rightarrow \mathcal {S}_{n_1,\ldots ,n_d} \cap \mathcal {W},\; \mathbf {a}^1\otimes \cdots \otimes \mathbf {a}^d \mapsto (U_1 \mathbf {a}^1)\otimes \cdots \otimes (U_d \mathbf {a}^d) \end{aligned}$$
is an isometry of manifolds. For \(1\le i\le r-2\), we define https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq994_HTML.gif , where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq995_HTML.gif are the rank-1 tensors from above. We write https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq996_HTML.gif . Because U is an isometry, https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq997_HTML.gif , so we can choose \(\Vert \mathbf {a}^1_i \Vert = \cdots =\Vert \mathbf {a}^d_i\Vert = 1\). The plan for the rest of the proof is to show that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq999_HTML.gif is a finite value which does not depend on https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1000_HTML.gif , and that there is a neighborhood around https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1001_HTML.gif , whose size is also independent of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1002_HTML.gif , on which the Jacobian does not deviate too much.
For showing this, we first compute the derivative of \(\phi \) at the point https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1004_HTML.gif :
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ220_HTML.png
where https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1005_HTML.gif . Let \(V\in \mathbb {R}^{\varPi \times 2\varSigma }\) be a matrix whose columns form an orthonormal basis of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1007_HTML.gif , and for \(1\le i\le r-2\) let \(W_i\in \mathbb {R}^{\varPi \times \varSigma }\) be a matrix whose columns form an orthonormal basis of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1010_HTML.gif . Then, we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ58_HTML.png
(58)
Let us write https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1011_HTML.gif with https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1012_HTML.gif . From [25, Corollary 2.2], we know that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1013_HTML.gif . Moreover, by Eq. (11), https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1014_HTML.gif . Since https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1015_HTML.gif are, by assumption, elements of \(\mathcal {W}\), Lemma 16 implies that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1017_HTML.gif is orthogonal to https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1018_HTML.gif for all \(1\le i\le r-2\). Therefore, we get the following equation for Eq. (58):
$$\begin{aligned} \mathrm {vol}(\begin{bmatrix} V&W_1&\cdots&W_{r-2}\end{bmatrix}) = \mathrm {vol}(V)\,\mathrm {vol}(\begin{bmatrix} W_1&\cdots&W_{r-2}\end{bmatrix}) = \mathrm {vol}(\begin{bmatrix} W_1&\cdots&W_{r-2}\end{bmatrix}), \end{aligned}$$
the last equality because V has orthonormal columns. Let us further investigate the \(W_i\).
By Eq. (9), the tangent space of \(\mathcal {S}_{n_1,\ldots ,n_d}\) at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1022_HTML.gif is
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ221_HTML.png
Moreover, by Lemma 17, \(\mathcal {S}_{n_1,\ldots ,n_d} \cap \mathcal {W}\) is a manifold, and its tangent space at https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1024_HTML.gif is
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ222_HTML.png
For all \(1\le k\le d\), let \(\{\mathbf {t}^k,\mathbf {s}^k\}\) be an orthonormal basis of \(\mathcal {A}_k\), and let us write
$$\begin{aligned} \mathcal {K}_i&:= \mathrm {span}\{\mathbf {t}^1\otimes \mathbf {a}_i^2 \otimes \cdots \otimes \mathbf {a}_i^d, \ldots , \mathbf {a}_i^1 \otimes \cdots \otimes \mathbf {a}_i^{d-1} \otimes \mathbf {t}^d\}, \text { and }\\ \mathcal {L}_i&:= \mathrm {span}\{\mathbf {s}^1\otimes \mathbf {a}_i^2 \otimes \cdots \otimes \mathbf {a}_i^d, \ldots , \mathbf {a}_i^1 \otimes \cdots \otimes \mathbf {a}_i^{d-1} \otimes \mathbf {s}^d\}. \end{aligned}$$
Because \(\mathbf {a}_i^k\in \mathcal {A}_k^\perp \) for \(1\le i\le r-2\), and because \(\Vert \mathbf {a}_i^j\Vert =1\), the tensors listed even form orthogonal bases for \(\mathcal {K}_i\) and \(\mathcal {L}_i\), respectively. Furthermore, we have for all pairs of indices ij that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ223_HTML.png
Therefore, we have the following orthogonal decomposition:
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ224_HTML.png
The columns of \(UX_i\) form an orthonormal basis of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1034_HTML.gif . Altogether, we have
$$\begin{aligned}&\mathrm {vol}\left( \begin{bmatrix} W_1&\cdots&W_{r-2}\end{bmatrix}\right) \\&\quad = \mathrm {vol}\left( \begin{bmatrix} UX_1&\cdots&UX_{r-2}\end{bmatrix}\right) \prod \mathrm {vol}\left( \begin{bmatrix}\mathbf {a}_1^{h_1} \otimes \cdots \otimes \mathbf {a}_1^{h_{d-1}}&\cdots&\mathbf {a}_{r-2}^{h_1} \otimes \cdots \otimes \mathbf {a}_{r-2}^{h_{d-1}}\end{bmatrix}\right) ^2\\&\quad = \mathrm {vol}\left( \begin{bmatrix} X_1&\cdots&X_{r-2}\end{bmatrix}\right) \prod \mathrm {vol}\left( \begin{bmatrix}\mathbf {x}_1^{h_1} \otimes \cdots \otimes \mathbf {x}_1^{h_{d-1}}&\cdots&\mathbf {x}_{r-2}^{h_1} \otimes \cdots \otimes \mathbf {x}_{r-2}^{h_{d-1}}\end{bmatrix}\right) ^2, \end{aligned}$$
where both products range for \(1\le h_1<\cdots <h_{d-1}\le d\). This implies that
$$\begin{aligned} \mathrm {vol}(\begin{bmatrix} W_1&\cdots&W_{r-2}\end{bmatrix}) =2 \mu \end{aligned}$$
is independent of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1036_HTML.gif .
The rest of the proof is a variational argument: Let us denote by \(\mathrm {Gr}(\varSigma ,\varPi )\) the Grassmann manifold of \(\varSigma \)-dimensional linear spaces in \(\mathbb {R}^\varPi \). We endow \(\mathrm {Gr}(\varSigma ,\varPi )\) with the standard Riemannian metric, such that the distance between two spaces is the Euclidean length of the vector of principal angles [14]. Let us denote this distance by \(d(\cdot ,\cdot )\). Furthermore, let https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1042_HTML.gif be the Gauss map.
From Breiding and Vannieuwenhoven [23, Proposition 4.3], we get for each \(1\le i\le r-2\) that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1044_HTML.gif . This means, that for \(\epsilon >0\) and any tuple https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1046_HTML.gif satisfying https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1047_HTML.gif for \(1\le i\le r-2\), we have https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1049_HTML.gif for sufficiently small \(\epsilon \). Let \(W_i'\in \mathbb {R}^{\varPi \times \varSigma }\) be a matrix with orthonormal columns that span https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1052_HTML.gif . Consequently, there is a constant \(K>0\), which depends only on \(n_1,\ldots ,n_d\), such that \(\Vert W_i - W_i'\Vert <\epsilon K\sqrt{\varSigma }\).
Recall that the columns of \(\left[ {\begin{matrix} W_1&\cdots&W_{r-2}\end{matrix}}\right] \) are orthogonal to the columns of V. Moreover, \(r\varSigma \le \varPi \), since we have assumed that \(\sigma _{r;n_1,\ldots ,n_d}\) is generically complex identifiable. Hence, there is a matrix \(M\in \mathbb {R}^{\varPi \times \varPi }\) with \(MV=V\) and \(MW_i=W_i'\) for \(1\le i\le r-2\) that leaves the orthogonal complement of \(V+W_1+\cdots +W_{r-2}\) fixed. This implies
$$\begin{aligned} \Vert I_\varPi -M\Vert _2=\Vert (I_\varPi -M)\left[ {\begin{matrix} W_1&\cdots&W_{r-2}\end{matrix}}\right] \Vert _2 \le \sum _{i=1}^{r-2}\Vert W_i-W_i'\Vert < \epsilon rK\sqrt{\varSigma }, \end{aligned}$$
where \(I_\varPi \) is the \(\varPi \times \varPi \) identity matrix. Therefore, if we choose \(\epsilon \) small enough, we may assume \(\det M > \tfrac{1}{2}\). Note that such a choice of \(\epsilon \) is independent of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1069_HTML.gif .
Altogether, we have shown that for all tuples https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1070_HTML.gif satisfying https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1071_HTML.gif we have that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ225_HTML.png
and both \(\epsilon \) and \(\mu \) have been chosen independent of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1074_HTML.gif . This concludes the proof. \(\square \)

Proofs of the lemmata in Section 5

Proof of Lemma 11

The proof is similar to the proof of Lemma 3. Integrating in polar coordinates, we have
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ226_HTML.png
Note that the argument of q is a \(\varPi \times (2\varSigma -2)\) matrix. Since \(q(A) = \varsigma _1(A) \cdots \varsigma _{n-1}(A)\) for \(A \in \mathbb {R}^{m \times n}\) with \(n\le m\), we have
$$\begin{aligned} q\left( (\mathrm {I}-M M^\dagger )\begin{bmatrix} \rho \cos (\theta )L_1&\rho \sin (\theta )L_2\end{bmatrix}\right) = \rho ^{2\varSigma -3}\, q\left( (\mathrm {I}-MM^\dagger )\begin{bmatrix} \cos (\theta )L_1&\sin (\theta )L_2\end{bmatrix}\right) . \end{aligned}$$
This yields
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ227_HTML.png
The change of variables https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1080_HTML.gif transforms the integral for \(\rho \) into
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ228_HTML.png
Plugging the foregoing into the expression for \(J_\text {inner}\) concludes the proof. \(\square \)

Proof of Lemma 12

First, note that the left-hand term in (34) is bounded above by a constant depending on \(n_1,\ldots ,n_d,d\). Thus, by choosing an appropriate constant K in (34) we can assume that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1085_HTML.gif is smaller than any predefined quantity. We thus assume from now on that https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1086_HTML.gif for some \(\epsilon >0\) that can be chosen as small as desired. Furthermore, as in Eq. (39), we write
$$\begin{aligned} \delta _k := \langle \mathbf {u}^k,\mathbf {v}^k\rangle = \sqrt{\tfrac{1}{\epsilon _k^2+1}}, \quad \epsilon _k := \Vert \mathbf {u}^k - \mathbf {v}^k\Vert = \sqrt{\tfrac{1}{\delta _k^2}-1}, \quad \text {and}\quad z := \delta _1\cdots \delta _d. \end{aligned}$$
We also assume that \(\delta _1 = \min \{\delta _1,\ldots ,\delta _d\}\), or, equivalently, \(\epsilon _1 = \max \{\epsilon _1,\ldots ,\epsilon _d\}\). Note that if \(\epsilon \approx 0\), then one can assume \(\epsilon _k \approx 0\) and \(\delta _k\approx 1\) for all \(1\le k\le d\). In particular, all inequalities from the proof of Lemma 6 in Appendix A.3 are still valid here.

Dropping the Scaling

For simplifying notation, we abbreviate
$$\begin{aligned} A = (\mathrm {I}-MM^\dagger )\begin{bmatrix} L_1&L_2\end{bmatrix} \quad \text {and}\quad B = \mathrm {diag}(\underbrace{\cos (\theta ), \ldots , \cos (\theta )}_{(\varSigma -1) \text { times}}, \underbrace{\sin (\theta ), \ldots , \sin (\theta )}_{(\varSigma -1) \text { times}}), \end{aligned}$$
so that \(AB = (\mathrm {I}-MM^\dagger )\begin{bmatrix} \cos (\theta ) L_1&\sin (\theta ) L_2\end{bmatrix}\). Observe that, by definition, A ultimately depends on https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1095_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1096_HTML.gif ; that is, https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1097_HTML.gif .
Recall from Eq. (8) that \(q(\cdot )\) is the product of all but the smallest singular value of its argument. We first show that \(q(AB)\le q(A)\). To see this, let \(\varsigma _1 \ge \cdots \ge \varsigma _{2(\varSigma -1)} \ge 0\) be the singular values of A. Then,
$$\begin{aligned} \det ((AB)^T (AB)) = \det (BB^T) \det (A^TA) = (\varsigma _1\cdots \varsigma _{2(\varSigma -1)})^2 (\cos (\theta )\sin (\theta ))^{2(\varSigma -1)}. \end{aligned}$$
This shows
$$\begin{aligned} q(AB) = \frac{\varsigma _1\cdots \varsigma _{2(\varSigma -1)} \cdot (\cos (\theta )\sin (\theta ))^{\varSigma -1}}{\varsigma _{\min }(AB)}, \end{aligned}$$
where \(\varsigma _{\min }(\cdot )\) denotes the smallest singular value; see Eq. (7). Next, we have
$$\begin{aligned} \varsigma _{\min }(AB) = \min _{\mathbf {x}\in \mathbb {S}(\mathbb {R}^{2(\varSigma -1)})}\, \Vert AB \mathbf {x}\Vert&\ge \min _{\mathbf {x}\in \mathbb {S}(\mathbb {R}^{2(\varSigma -1)})}\, \Vert A \mathbf {x}\Vert \quad \cdot \min _{\mathbf {x}\in \mathbb {S}(\mathbb {R}^{2(\varSigma -1)})}\, \Vert B \mathbf {x}\Vert \\&= \varsigma _{2(\varSigma -1)} \cdot \min \{|\cos (\theta )|, |\sin (\theta )|\}. \end{aligned}$$
Moreover, for \(0\le \theta \le \tfrac{\pi }{2}\) we have \(0\le \cos (\theta )\le 1\) and \(0\le \sin (\theta )\le 1\). Altogether, this implies \(q(AB)\le \varsigma _1\cdots \varsigma _{2(\varSigma -1)-1} = q(A)\). In the rest of the proof, we bound the \(\varsigma _i\)’s.

Simplifying the Matrix by Orthogonal Transformations

Recall from the proof of Lemma 6 that applying the orthogonal transformation in Eq. (44) on the right, we have
$$\begin{aligned} \varsigma _i = \varsigma _i \bigl ( (I-MM^\dagger ) \begin{bmatrix} L_1&L_2 \end{bmatrix} \bigr ) = \varsigma _i \bigl ( (I-MM^\dagger ) \begin{bmatrix} R_\uparrow&R_\downarrow \end{bmatrix} \bigr ), \quad i=1,\ldots ,2(\varSigma -1), \end{aligned}$$
where \(\varsigma _i(A)\) denotes the ith largest singular value of the matrix A. Recalling Eq. (49), we have
$$\begin{aligned} \varsigma _i = \varsigma _i\bigl ( (I-P) \begin{bmatrix} S_\uparrow&S_\downarrow&T_\uparrow&T_\downarrow \end{bmatrix} \bigr ), \end{aligned}$$
where \(P = M M^\dagger \). Let \(\mathbf {a}_\uparrow \) and \(\mathbf {a}_\downarrow \) be as in Eq. (43). The matrix \(M M^\dagger \) projects orthogonally onto the span of https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1112_HTML.gif and https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_IEq1113_HTML.gif , which coincides with the span of the orthonormal vectors \(\frac{\mathbf {a}_\uparrow }{\Vert \mathbf {a}_\uparrow \Vert }\) and \(\frac{\mathbf {a}_\downarrow }{\Vert \mathbf {a}_\downarrow \Vert }\). Moreover, by Eq. (47) we have \(\mathbf {a}_\downarrow ^T \mathbf {a}_\downarrow = 1+z\) and \(\mathbf {a}_\uparrow ^T \mathbf {a}_\uparrow = 1-z\). This shows that
$$\begin{aligned} P = \frac{1}{1+z} \mathbf {a}_\downarrow \mathbf {a}_\downarrow ^T + \frac{1}{1-z} \mathbf {a}_\uparrow \mathbf {a}_\uparrow ^T. \end{aligned}$$
Next, it follows from Eq. (51) that \( P S_\uparrow =0\) and \( P S_\downarrow =0\), so that \(\varsigma _i=\varsigma _i( \widetilde{N} )\), where
$$\begin{aligned} \widetilde{N} := \begin{bmatrix} S_\uparrow&S_\downarrow&(I-P)T_\uparrow&(I-P)T_\downarrow \end{bmatrix}. \end{aligned}$$

Computing the Gram Matrix

Next, we compute the Gram matrix of \(\widetilde{N}\). Consider again Eq. (51), from which all of the following computations follow. We have
$$\begin{aligned} \begin{bmatrix} S_\uparrow ^T S_\uparrow &{} S_\uparrow ^T S_\downarrow \\ S_\downarrow ^T S_\uparrow &{} S_\downarrow ^T S_\downarrow \end{bmatrix} = \begin{bmatrix} F_\uparrow &{} 0 \\ 0 &{} F_\downarrow \end{bmatrix}, \end{aligned}$$
where the F’s are the diagonal matrices from Eq. (50). From the symmetry of P, we obtain \(S_\downarrow ^T P = 0\) and \(S_\uparrow ^T P=0,\) so that
$$\begin{aligned} S_\uparrow ^T (I - P) T_\uparrow&= S_\uparrow ^T T_\uparrow - 0 = 0,&S_\downarrow ^T (I - P) T_\uparrow&= S_\downarrow ^T T_\uparrow - 0 = 0, \\ S_\uparrow ^T (I - P) T_\downarrow&= S_\uparrow ^T T_\downarrow - 0 = 0,&S_\downarrow ^T (I - P) T_\downarrow&= S_\downarrow ^T T_\downarrow - 0 = 0. \end{aligned}$$
We also find
$$\begin{aligned} T_\uparrow ^T (I - P) T_\downarrow =&T_\uparrow ^T T_\downarrow - T_\uparrow ^T \left( \frac{1}{1+z} \mathbf {a}_\downarrow \mathbf {a}_\downarrow ^T + \frac{1}{1-z} \mathbf {a}_\uparrow \mathbf {a}_\uparrow ^T \right) T_\downarrow \\=&0 - \left( 0 + \frac{1}{1-z} T_\uparrow ^T \mathbf {a}_\uparrow \mathbf {a}_\uparrow ^T \right) T_\downarrow \\=&0. \end{aligned}$$
We observe
$$\begin{aligned} T_\uparrow ^T (I - P) T_\uparrow&= T_\uparrow ^T T_\uparrow - T_\uparrow ^T \left( \frac{1}{1+z} \mathbf {a}_\downarrow \mathbf {a}_\downarrow ^T + \frac{1}{1-z} \mathbf {a}_\uparrow \mathbf {a}_\uparrow ^T \right) T_\uparrow \\&= (E_\uparrow - z \mathbf {g}\mathbf {g}^T) - 0 - \frac{1}{1-z} (-z \mathbf {g})(-z \mathbf {g})^T \\&= E_\uparrow - \frac{z}{1-z} \mathbf {g}\mathbf {g}^T. \end{aligned}$$
Analogously, we find
$$\begin{aligned} T_\downarrow ^T (I - P) T_\downarrow = (E_\downarrow + z\mathbf {g}\mathbf {g}^T) - \frac{1}{1+z} (z \mathbf {g})(z\mathbf {g})^T = E_\downarrow + \frac{z}{1 + z} \mathbf {g} \mathbf {g}^T. \end{aligned}$$
Combining all of the foregoing observations results in
$$\begin{aligned} \widetilde{N}^T \widetilde{N} = \begin{bmatrix} F_\uparrow &{} 0 &{} 0 &{} 0 \\ 0 &{} F_\downarrow &{} 0 &{} 0 \\ 0 &{} 0 &{} E_\uparrow - \frac{z}{1-z} \mathbf {g}\mathbf {g}^T &{} 0 \\ 0 &{} 0 &{} 0 &{} E_\downarrow + \frac{z}{1+z} \mathbf {g}\mathbf {g}^T \end{bmatrix}, \end{aligned}$$
so that the singular values of \(\widetilde{N}\) are given by the square roots of the eigenvalues of the matrices on the block diagonal.

Bounding the Singular Values

In the remainder, let \(\lambda _i(A)\) denote the ith largest eigenvalue of the positive semidefinite matrix A. Since \(0 < \delta _k \le 1\) for all \(1\le k\le d\), the eigenvalues \(1+z\delta _k^{-1} = 1 + \prod _{j\ne k}\delta _j\) for \(1\le k\le d,\) of the diagonal matrix \(F_\uparrow \) satisfy
$$\begin{aligned} \lambda _i( F_\uparrow ) \le 2, \quad i=1,\ldots , \varSigma -d-1. \end{aligned}$$
(59)
An upper bound for the eigenvalues of \(F_\downarrow \) is given by
$$\begin{aligned} 1 - z\delta _k^{-1} = 1 - \prod _{1 \le j \ne k \le d} \delta _j \le 1 - \delta _1^{d-1}= (1-\delta _1)(1+\delta _1+\cdots +\delta _1^{d-2})\le d(1-\delta _1), \end{aligned}$$
since \(0 < \delta _1 \le 1\) for sufficiently small \(\epsilon \). Hence, we obtain the bound
$$\begin{aligned} \lambda _i( F_\downarrow ) \le d (1 - \delta _1), \quad i=1,\ldots ,\varSigma -d-1. \end{aligned}$$
(60)
The eigenvalues of \(E_\downarrow + \frac{z}{1+z} \mathbf {g}\mathbf {g}^T\) can be bounded by using that the spectral norm of the rank-1 term is bounded by \(\Vert \mathbf {g}\Vert ^2 = \sum _{k=1}^d \epsilon _k^2 \le d \epsilon _1^2\). It follows from Weyl’s perturbation inequality; see, e.g., [51, Corollary 7.3.8], and the positive semidefiniteness of \(E_\downarrow + \frac{z}{1+z} \mathbf {g}\mathbf {g}^T\) that
$$\begin{aligned} \lambda _i \left( E_\downarrow + \frac{z}{1+z} \mathbf {g}\mathbf {g}^T \right) \le \lambda _i ( E_\downarrow ) + d \epsilon _1^2. \end{aligned}$$
From the definition of \(\delta _1\) and \(\epsilon _1\), we have
$$\begin{aligned} {\epsilon _1^2 \le \frac{3}{2}\Vert \mathbf {u}^1-\mathbf {v}^1\Vert ^2} \le 3(1-\delta _1), \end{aligned}$$
(61)
provided that \(\epsilon \) is sufficiently small. The eigenvalues of the diagonal matrix \(E_\downarrow \) are bounded from above by \(C (d+1) \epsilon _1^2\) due to Eq. (54). Putting all of these together and using \(d \ge 1\), we find
$$\begin{aligned} \lambda _i\left( E_\downarrow + \frac{z}{1+z} \mathbf {g}\mathbf {g}^T \right) \le C' d (1 - \delta _1), \quad i = 1,\ldots , d, \end{aligned}$$
(62)
for some universal constant \(C'>0\).
For computing an upper bound on the eigenvalues of \(E_\uparrow - \frac{z}{1-z} \mathbf {g}\mathbf {g}^T\), we start by noting that
$$\begin{aligned} E_\uparrow - \frac{z}{1-z} \mathbf {g}\mathbf {g}^T = - \frac{z}{1-z} \mathbf {g}\mathbf {g}^T + 2 \mathrm {I}_{d} + \mathrm {diag}\bigl ( z(1+\epsilon _1^2)-1, \ldots , z(1+\epsilon _d^2)-1 \bigr ); \end{aligned}$$
the spectral norm of the last diagonal matrix is bounded by \(C (d+1) \epsilon _1^2\) because of Eq. (54). Adding the matrix \(2\mathrm {I}_d\) causes all eigenvalues to be shifted by 2; hence, it suffices to compute the nonzero eigenvalue of the rank-1 matrix. Its eigenvalues are
$$\begin{aligned} \underbrace{0, \ldots , 0}_{d-1 \text { times}}, \frac{-z}{1-z} \Vert \mathbf {g}\Vert ^2; \end{aligned}$$
the eigenvector corresponding to the last eigenvalue is \(\frac{\mathbf {g}}{\Vert \mathbf {g}\Vert }\). In order to bound that eigenvalue, note that from (55) and (54) we have for some constant \(c=c(d)>0\):
$$\begin{aligned} -z \frac{\Vert \mathbf {g}\Vert ^2}{1 - z}&= \frac{\sum _{k=1}^d \epsilon _k^2}{1-\prod _{k=1}^d\sqrt{\epsilon _k^2+1}} \le -2+c\epsilon _1^2, \end{aligned}$$
where the last inequality is proved as follows. Note that \(\sqrt{1+t^2}\le 1+\frac{t^2}{2}\), which can be verified by squaring the terms and comparing. Then,
$$\begin{aligned} \frac{\sum _{k=1}^d \epsilon _k^2}{-1+\prod _{k=1}^d\sqrt{\epsilon _k^2+1}} \ge \frac{\sum _{k=1}^d \epsilon _k^2}{-1+\prod _{k=1}^d\left( 1+\frac{\epsilon _k^2}{2}\right) } =\frac{\sum _{k=1}^d \epsilon _k^2}{\frac{1}{2}\sum _{k=1}^d \epsilon _k^2+p}, \end{aligned}$$
where \(p=p(\epsilon _1,\ldots ,\epsilon _d)\) is a polynomial expression in the \(\epsilon _i\) of degree and coefficients bounded by a constant depending only on d, with all its monomials of degree at least 4 in \(\epsilon _1,\ldots ,\epsilon _d\). Hence, for some constant \(c=c(d)\) we have \(|p|\le c(d)\epsilon _1^4\) and we conclude that
$$\begin{aligned} \frac{\sum _{k=1}^d \epsilon _k^2}{-1+\prod _{k=1}^d\sqrt{\epsilon _k^2+1}} \ge \frac{2}{1+\frac{2c(d)\epsilon _1^4}{\sum _{k=1}^d \epsilon _k^2}}\ge 2-\frac{4c(d)\epsilon _1^4}{\sum _{k=1}^d \epsilon _k^2}\ge 2-{\hat{c}}(d)\epsilon _1^2, \end{aligned}$$
for some new constant \(\hat{c}(d)\); the last step follows from Eq. (40).
Putting all of the foregoing together with Eq. (61), we have thus shown that \(E_\uparrow - \frac{z}{1-z} \mathbf {g}\mathbf {g}^T\) has eigenvalues satisfying
$$\begin{aligned} \lambda _d\left( E_\uparrow - \frac{z}{1-z} \mathbf {g}\mathbf {g}^T \right)&\le c'(d)(1 - \delta _1), \text { and } \nonumber \\ \lambda _i\left( E_\uparrow - \frac{z}{1-z} \mathbf {g}\mathbf {g}^T \right)&\le 2 + c'(d) (1-\delta _1), \quad i=1,\ldots ,d-1, \end{aligned}$$
(63)
where \(c'(d)\) is again a constant depending only on d.
In Eqs. (59), (60), (62) and (63), we have shown that precisely \(\varSigma -d-1 + d + 1 = \varSigma \) eigenvalues are smaller than some constant times \(1 - \delta _1\), while the remaining eigenvalues are clustered near 2. It follows that there exists a constant K, depending only on \(n_1, \ldots , n_d\) and d, such that
https://static-content.springer.com/image/art%3A10.1007%2Fs10208-022-09551-1/MediaObjects/10208_2022_9551_Equ229_HTML.png
where the last step is by lemma 5. This finishes the proof. \(\square \)

Proof of Lemma 13

Let J be the integral in question; i.e.,
$$\begin{aligned} J = \int _{0}^{\frac{\pi }{2}} \frac{1}{\Vert \cos (\theta )\mathbf {x} - \sin (\theta )\mathbf {y}\Vert ^{a}}\, \mathrm {d}{}\theta . \end{aligned}$$
Writing \(\Vert \cos (\theta )\mathbf {x} - \sin (\theta )\mathbf {y}\Vert = \sqrt{1-\sin (2\theta ) \langle \mathbf {x},\mathbf {y}\rangle }\) and exploiting the symmetry of \(\sin (\theta )\) around \(\tfrac{\pi }{4}\), we have
$$\begin{aligned} J=2\int _{0}^{\frac{\pi }{4}}\frac{1}{\sqrt{1-\sin (2\theta ) \langle \mathbf {x},\mathbf {y}\rangle }^{\,a}} \mathrm {d}{}\theta \le \int _0^1\frac{1}{\sqrt{1-(1-t)\langle \mathbf {x},\mathbf {y}\rangle }^{\,a}\sqrt{t}} \,\mathrm {d}{}t; \end{aligned}$$
the inequality is due to the change of variables \(\sin (2\theta )=1-t\) and \(\sqrt{2t-t^2}\ge \sqrt{t}\) for \(|t|\le 1\). Let us write \(h:= \langle \mathbf {x},\mathbf {y}\rangle \). We distinguish between two cases. In the case \(h\le \tfrac{1}{2}\), we can bound
$$\begin{aligned} J\le \sqrt{2}^{\,a} \int _0^1 t^{-\frac{1}{2}} \,\mathrm {d}{} t = \sqrt{2}^{\,a+2} \le \frac{\sqrt{2}^{\,3a}}{\sqrt{2}^{\,a-1} \sqrt{1-h}^{\, a-1}}=\frac{\sqrt{2}^{\,3a+1}}{\Vert \mathbf {x} - \mathbf {y}\Vert ^{\, a-1}}. \end{aligned}$$
The second case is \(h>\tfrac{1}{2}\): A new change of variables \(t=\tfrac{1-h}{h} \,u\) yields
$$\begin{aligned} J \le&\sqrt{\frac{1-h}{h}} \int _{0}^{\frac{h}{1-h}}\frac{1}{\sqrt{1-h+(1-h)u}^{\,a}\sqrt{u}}\,\mathrm {d}{}u \\=&\frac{1}{\sqrt{h}} \, \frac{1}{\sqrt{1-h}^{\,a-1}} \int _{0}^{\frac{h}{1-h}}\frac{1}{\sqrt{1+u}^{\,a}\sqrt{u}}\,\mathrm {d}{}u \\ \le&\frac{1}{\sqrt{h}} \, \frac{1}{\sqrt{1-h}^{\,a-1}} \left( \int _{0}^{1}\frac{1}{\sqrt{1+u} \sqrt{u}}\,\mathrm {d}{}u +\int _{1}^{\infty }\frac{1}{u^{(a+1)/2}}\,\mathrm {d}{}u \right) . \end{aligned}$$
The last integrals add to at most \(2+2(a-1)^{-1}\), and we thus have proved for \(h>\tfrac{1}{2}\):
$$\begin{aligned} J\le \frac{2(1+(a-1)^{-1})}{\sqrt{h}\sqrt{1-h}^{\,a-1}} \le \frac{\sqrt{2}^{\,3}(1+(a-1)^{-1})}{\sqrt{1-\langle \mathbf {x},\mathbf {y}\rangle }^{\, a-1}} =\frac{\sqrt{2}^{\,a+2}(1+(a-1)^{-1})}{\Vert \mathbf {x}-\mathbf {y}\Vert ^{a-1}}. \end{aligned}$$
The lemma is proved. \(\square \)

Proof of Lemma 14

We prove the lemma by induction. The first case \(d=1\) reads \(\cos (\theta _1) \le 1-\tfrac{\theta _1^2}{7}\). In fact, in this case we have the stronger inequality \(\cos (\theta ) \le 1-\tfrac{\theta ^2}{4}\) as the following argument shows: Consider the map \(f:[0,\tfrac{\pi }{2}] \rightarrow \mathbb {R}, \theta \mapsto 1 - \tfrac{\theta ^2}{4} - \cos (\theta )\). We have \(f(0)=0\) and \(f(\tfrac{\pi }{2}) = 1 - \tfrac{\pi ^2}{16}>0\). Moreover, \(f'(\theta )=\sin (\theta )- \tfrac{\theta }{2}>0\) for \(0\le \theta \le \tfrac{\pi }{2}\). This implies that we have \(f\ge 0\) proving the case \(d=1\). For general d, note that by the induction hypothesis
$$\begin{aligned} \cos (\theta _1)\cdots \cos (\theta _d)&=\cos (\theta _1)\cdots \cos (\theta _{d-1})\cos (\theta _d)\\&\le \left( 1-\frac{\theta _1^2+\cdots +\theta _{d-1}^2}{7(d-1)}\right) \cos (\theta _d)\\&\le \left( 1-\frac{\theta _1^2+\cdots +\theta _{d-1}^2}{7d}\right) \left( 1-\frac{\theta _d^2}{4}\right) \\&=1-\frac{\theta _1^2+\cdots +\theta _d^2}{7d}+\frac{\theta _1^2+\cdots +\theta _{d-1}^2-7d+4}{28d}\,\theta _d^2. \end{aligned}$$
Since \(\theta _1^2+\cdots +\theta _d^2-7d+4\le d\,\tfrac{\pi ^2}{4}-7d+4\le 0\) for \(d\ge 2\), we conclude the proof of the lemma. \(\square \)
Footnotes
1
The elements of \(\sigma _{r;n_1,\ldots ,n_d}^\mathbb {C}\) can be parameterized as in Eq. (1) changing \(\mathbb {R}\) to \(\mathbb {C}\).
 
2
The preimage of an r-identifiable tensor under the map \(\varPhi \) consists of the r! permutations of the summands.
 
3
This is exactly the concern of Remark 1: What computational problem are we interested in solving when a tensor has several distinct CPDs? Are we interested in the CPD with the best sensitivity? Or the worst? Or the expected condition number of one randomly chosen CPD in the fiber? This depends on the context. The results of this paper are valid regardless of the particular variation of the problem one is interested in.
 
4
This is different from \(\phi |_{\phi ^{-1}(\phi (\mathcal {U}))}\) being a diffeomorphism. Indeed, that mapping is in general finite-to-one
 
5
See [57, Chapter 7] and the references therein for some results on equations of the algebraic closure of \(\sigma _r\).
 
Literature
1.
go back to reference Allman ES, Matias C, Rhodes JA (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Statist 37(6A):3099–3132MathSciNetMATHCrossRef Allman ES, Matias C, Rhodes JA (2009) Identifiability of parameters in latent structure models with many observed variables. Ann Statist 37(6A):3099–3132MathSciNetMATHCrossRef
4.
go back to reference Anandkumar A, Ge R, Hsu D, Kakade SM, Telgarsky M (2014) Tensor decompositions for learning latent variable models. J Mach Learn Res 15:2773–2832MathSciNetMATH Anandkumar A, Ge R, Hsu D, Kakade SM, Telgarsky M (2014) Tensor decompositions for learning latent variable models. J Mach Learn Res 15:2773–2832MathSciNetMATH
5.
go back to reference Angelini E, Bocci C, Chiantini L (2017) Real identifiability vs. complex identifiability. Linear Multilinear Algebra 66:1257–1267MathSciNetMATHCrossRef Angelini E, Bocci C, Chiantini L (2017) Real identifiability vs. complex identifiability. Linear Multilinear Algebra 66:1257–1267MathSciNetMATHCrossRef
6.
go back to reference Armentano D, Beltrán C (2019) The polynomial eigenvalue problem is well conditioned for random inputs. SIAM J Matrix Anal Appl 40(1):175–193MathSciNetMATHCrossRef Armentano D, Beltrán C (2019) The polynomial eigenvalue problem is well conditioned for random inputs. SIAM J Matrix Anal Appl 40(1):175–193MathSciNetMATHCrossRef
8.
go back to reference Beltrán C, Kozhasov K (2020) The real polynomial eigenvalue problem is well conditioned on the average. Found Comput Math 20(2):291–309MathSciNetMATHCrossRef Beltrán C, Kozhasov K (2020) The real polynomial eigenvalue problem is well conditioned on the average. Found Comput Math 20(2):291–309MathSciNetMATHCrossRef
9.
10.
go back to reference Beltrán C, Marzo J, Ortega-Cerdà J (2016) Energy and discrepancy of rotationally invariant determinantal point processes in high dimensional spheres. J Complexity 37:76–109MathSciNetMATHCrossRef Beltrán C, Marzo J, Ortega-Cerdà J (2016) Energy and discrepancy of rotationally invariant determinantal point processes in high dimensional spheres. J Complexity 37:76–109MathSciNetMATHCrossRef
11.
go back to reference Beltrán C, Breiding P, Vannieuwenhoven N (2019) Pencil-based algorithms for tensor rank decomposition are not stable. SIAM J Matrix Anal Appl 40(2):739–773MathSciNetMATHCrossRef Beltrán C, Breiding P, Vannieuwenhoven N (2019) Pencil-based algorithms for tensor rank decomposition are not stable. SIAM J Matrix Anal Appl 40(2):739–773MathSciNetMATHCrossRef
12.
go back to reference Benedetti R, Risler JJ (1990) Real algebraic and semi-algebraic sets. Actualités Mathématiques. [Current Mathematical Topics], Hermann, Paris Benedetti R, Risler JJ (1990) Real algebraic and semi-algebraic sets. Actualités Mathématiques. [Current Mathematical Topics], Hermann, Paris
13.
go back to reference Bergqvist G, Forrester PJ (2011) Rank probabilities for real random \(N \times N \times 2\) tensors. Elect Comm in Probab 16:630–637MathSciNetMATH Bergqvist G, Forrester PJ (2011) Rank probabilities for real random \(N \times N \times 2\) tensors. Elect Comm in Probab 16:630–637MathSciNetMATH
14.
16.
go back to reference Blum L, Cucker F, Shub M, Smale S (1998) Complexity and Real Computation. Springer–Verlag, New YorkMATHCrossRef Blum L, Cucker F, Shub M, Smale S (1998) Complexity and Real Computation. Springer–Verlag, New YorkMATHCrossRef
17.
19.
go back to reference Breiding P, Timme S (2018) HomotopyContinuation.jl: A package for homotopy continuation in Julia. Mathematical Software – ICMS 2018 Lecture Notes in Computer Science Software available at wwwjuliahomotopycontinuationorg Breiding P, Timme S (2018) HomotopyContinuation.jl: A package for homotopy continuation in Julia. Mathematical Software – ICMS 2018 Lecture Notes in Computer Science Software available at wwwjuliahomotopycontinuationorg
20.
21.
go back to reference Breiding P, Vannieuwenhoven N (2018) Convergence analysis of Riemannian Gauss-Newton methods and its connection with the geometric condition number. Appl Math Letters 78:42–50MathSciNetMATHCrossRef Breiding P, Vannieuwenhoven N (2018) Convergence analysis of Riemannian Gauss-Newton methods and its connection with the geometric condition number. Appl Math Letters 78:42–50MathSciNetMATHCrossRef
22.
go back to reference Breiding P, Vannieuwenhoven N (2018) A Riemannian trust region method for the canonical tensor rank approximation problem. SIAM J Optim 28:2435–2465MathSciNetMATHCrossRef Breiding P, Vannieuwenhoven N (2018) A Riemannian trust region method for the canonical tensor rank approximation problem. SIAM J Optim 28:2435–2465MathSciNetMATHCrossRef
23.
go back to reference Breiding P, Vannieuwenhoven N (2020) On the average condition number of tensor rank decompositions. IMA J Numer Anal 40(3):1908–1936MathSciNetMATHCrossRef Breiding P, Vannieuwenhoven N (2020) On the average condition number of tensor rank decompositions. IMA J Numer Anal 40(3):1908–1936MathSciNetMATHCrossRef
24.
25.
go back to reference Buczynski J J Landsberg (2013) Ranks of tensors and a generalization of secant varieties. Linear Algebra Appl 15:668–689MathSciNetMATH Buczynski J J Landsberg (2013) Ranks of tensors and a generalization of secant varieties. Linear Algebra Appl 15:668–689MathSciNetMATH
28.
go back to reference Bürgisser P, Cucker F (2013) Condition: The Geometry of Numerical Algorithms, Grundlehren der mathematischen Wissenschaften, vol 349. Springer–Verlag Bürgisser P, Cucker F (2013) Condition: The Geometry of Numerical Algorithms, Grundlehren der mathematischen Wissenschaften, vol 349. Springer–Verlag
29.
go back to reference Bürgisser P, Clausen M, Shokrollahi MA (1997) Algebraic Complexity Theory, Grundlehren der mathematischen Wissenshaften, vol 315. Springer, Berlin, GermanyMATHCrossRef Bürgisser P, Clausen M, Shokrollahi MA (1997) Algebraic Complexity Theory, Grundlehren der mathematischen Wissenshaften, vol 315. Springer, Berlin, GermanyMATHCrossRef
30.
go back to reference Castro D, Montaña JL, Pardo LM, San Martín J (2002) The distribution of condition numbers of rational data of bounded bit length. Found Comput Math 2:1–52MathSciNetMATHCrossRef Castro D, Montaña JL, Pardo LM, San Martín J (2002) The distribution of condition numbers of rational data of bounded bit length. Found Comput Math 2:1–52MathSciNetMATHCrossRef
32.
go back to reference Chiantini L, Ottaviani G (2012) On generic identifiability of \(3\)-tensors of small rank. SIAM J Matrix Anal Appl 33(3):1018–1037MathSciNetMATHCrossRef Chiantini L, Ottaviani G (2012) On generic identifiability of \(3\)-tensors of small rank. SIAM J Matrix Anal Appl 33(3):1018–1037MathSciNetMATHCrossRef
33.
go back to reference Chiantini L, Ottaviani G, Vannieuwenhoven N (2014) An algorithm for generic and low-rank specific identifiability of complex tensors. SIAM J Matrix Anal Appl 35(4):1265–1287MathSciNetMATHCrossRef Chiantini L, Ottaviani G, Vannieuwenhoven N (2014) An algorithm for generic and low-rank specific identifiability of complex tensors. SIAM J Matrix Anal Appl 35(4):1265–1287MathSciNetMATHCrossRef
34.
go back to reference Chiantini L, Ottaviani G, Vannieuwenhoven N (2017) Effective criteria for specific identifiability of tensors and forms. SIAM J Matrix Anal Appl 38(2):656–681MathSciNetMATHCrossRef Chiantini L, Ottaviani G, Vannieuwenhoven N (2017) Effective criteria for specific identifiability of tensors and forms. SIAM J Matrix Anal Appl 38(2):656–681MathSciNetMATHCrossRef
35.
36.
go back to reference Comon P, Jutten C (2010) Handbook of Blind Source Separation: Independent Component Analysis and Applications. Elsevier Comon P, Jutten C (2010) Handbook of Blind Source Separation: Independent Component Analysis and Applications. Elsevier
37.
go back to reference de Silva V, Lim LH (2008) Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J Matrix Anal Appl 30(3):1084–1127MathSciNetMATHCrossRef de Silva V, Lim LH (2008) Tensor rank and the ill-posedness of the best low-rank approximation problem. SIAM J Matrix Anal Appl 30(3):1084–1127MathSciNetMATHCrossRef
40.
go back to reference Domanov I, De Lathauwer L (2015) Generic uniqueness conditions for the canonical polyadic decomposition and INDSCAL. SIAM J Matrix Anal Appl 36(4):1567–1589MathSciNetMATHCrossRef Domanov I, De Lathauwer L (2015) Generic uniqueness conditions for the canonical polyadic decomposition and INDSCAL. SIAM J Matrix Anal Appl 36(4):1567–1589MathSciNetMATHCrossRef
42.
go back to reference Ergür AA, Paouris G, Rojas JM (2019) Probabilistic condition number estimates for real polynomial systems I: A broader family of distributions. Found Comput Math 19(1):131–157MathSciNetMATHCrossRef Ergür AA, Paouris G, Rojas JM (2019) Probabilistic condition number estimates for real polynomial systems I: A broader family of distributions. Found Comput Math 19(1):131–157MathSciNetMATHCrossRef
43.
go back to reference Geršgorin S (1931) Über die Abgrenzung der Eigenwerte einer Matrix. Bulletin de l’Académie des Sciences de l’URSS Classe des sciences mathématiques et na (6):749–754 Geršgorin S (1931) Über die Abgrenzung der Eigenwerte einer Matrix. Bulletin de l’Académie des Sciences de l’URSS Classe des sciences mathématiques et na (6):749–754
44.
go back to reference Greub WH (1978) Multilinear Algebra. Springer–Verlag Greub WH (1978) Multilinear Algebra. Springer–Verlag
45.
go back to reference Hackbusch W (2012) Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics, vol 42. Springer–Verlag Hackbusch W (2012) Tensor Spaces and Numerical Tensor Calculus, Springer Series in Computational Mathematics, vol 42. Springer–Verlag
46.
go back to reference Harris J (1992) Algebraic Geometry, A First Course, Graduate Text in Mathematics, vol 133. Springer–Verlag Harris J (1992) Algebraic Geometry, A First Course, Graduate Text in Mathematics, vol 133. Springer–Verlag
47.
go back to reference Hauenstein J, Oeding L, Ottaviani G, Sommese A (2016) Homotopy techniques for tensor decomposition and perfect identifiability. J Reine Angew Math Hauenstein J, Oeding L, Ottaviani G, Sommese A (2016) Homotopy techniques for tensor decomposition and perfect identifiability. J Reine Angew Math
48.
go back to reference Hauenstein JD, Sottile F (2012) Algorithm 921: alphaCertified: Certifying solutions to polynomial systems. ACM Trans Math Softw 38(28):20MathSciNetMATH Hauenstein JD, Sottile F (2012) Algorithm 921: alphaCertified: Certifying solutions to polynomial systems. ACM Trans Math Softw 38(28):20MathSciNetMATH
49.
go back to reference Hauser R, Müller T (2009) Conditioning of random conic systems under a general family of input distributions. Found Comput Math 9:335–358MathSciNetMATHCrossRef Hauser R, Müller T (2009) Conditioning of random conic systems under a general family of input distributions. Found Comput Math 9:335–358MathSciNetMATHCrossRef
50.
go back to reference Hitchcock FL (1927) The expression of a tensor or a polyadic as a sum of products. J Math Phys 6:164–189MATHCrossRef Hitchcock FL (1927) The expression of a tensor or a polyadic as a sum of products. J Math Phys 6:164–189MATHCrossRef
51.
go back to reference Horn R, Johnson C (1990) Matrix Analysis, 2nd edn. Cambridge University Press, New York, NY, USAMATH Horn R, Johnson C (1990) Matrix Analysis, 2nd edn. Cambridge University Press, New York, NY, USAMATH
52.
go back to reference Howard R (1993) The kinematic formula in Riemannian homogeneous spaces. Mem Amer Math Soc 106(509):vi+69 Howard R (1993) The kinematic formula in Riemannian homogeneous spaces. Mem Amer Math Soc 106(509):vi+69
54.
go back to reference Kroonenberg PM (2008) Applied Multiway Data Analysis. Wiley series in probability and statistics, John Wiley & Sons, Hoboken, New Jersey Kroonenberg PM (2008) Applied Multiway Data Analysis. Wiley series in probability and statistics, John Wiley & Sons, Hoboken, New Jersey
55.
go back to reference Kruskal JB (1977) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl 18:95–138MathSciNetMATHCrossRef Kruskal JB (1977) Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics. Linear Algebra Appl 18:95–138MathSciNetMATHCrossRef
56.
go back to reference Lairez P (2017) A deterministic algorithm to compute approximate roots of polynomial systems in polynomial average time. Found Comput Math 17:1265–1292MathSciNetMATHCrossRef Lairez P (2017) A deterministic algorithm to compute approximate roots of polynomial systems in polynomial average time. Found Comput Math 17:1265–1292MathSciNetMATHCrossRef
57.
go back to reference Landsberg JM (2012) Tensors: Geometry and Applications, Graduate Studies in Mathematics, vol 128. AMS, Providence, Rhode Island Landsberg JM (2012) Tensors: Geometry and Applications, Graduate Studies in Mathematics, vol 128. AMS, Providence, Rhode Island
58.
go back to reference Lee JM (2013) Introduction to Smooth Manifolds, Graduate Texts in Mathematics, vol 218, 2nd edn. Springer–Verlag, New York, USA Lee JM (2013) Introduction to Smooth Manifolds, Graduate Texts in Mathematics, vol 218, 2nd edn. Springer–Verlag, New York, USA
59.
go back to reference McCullagh P (1987) Tensor Methods in Statistics. Monographs on statistics and applied probability, Chapman and Hall, New York McCullagh P (1987) Tensor Methods in Statistics. Monographs on statistics and applied probability, Chapman and Hall, New York
62.
go back to reference Shub M, Smale S (1993) Complexity of Bezout’s theorem. II. Volumes and probabilities. In: Computational algebraic geometry (Nice, 1992), Progr. Math., vol 109, Birkhäuser Boston, Boston, MA, pp 267–285 Shub M, Smale S (1993) Complexity of Bezout’s theorem. II. Volumes and probabilities. In: Computational algebraic geometry (Nice, 1992), Progr. Math., vol 109, Birkhäuser Boston, Boston, MA, pp 267–285
64.
go back to reference Sidiropoulos ND, Giannakis GB, Bro R (2000) Blind PARAFAC receivers for DS-CDMA systems. IEEE Trans Signal Process 48:810–823CrossRef Sidiropoulos ND, Giannakis GB, Bro R (2000) Blind PARAFAC receivers for DS-CDMA systems. IEEE Trans Signal Process 48:810–823CrossRef
65.
go back to reference Sidiropoulos ND, De Lathauwer L, Fu X, Huang K, Papalexakis EE, Faloutsos C (2017) Tensor decomposition for signal processing and machine learning. IEEE Trans Signal Process 65(13):3551–3582MathSciNetMATHCrossRef Sidiropoulos ND, De Lathauwer L, Fu X, Huang K, Papalexakis EE, Faloutsos C (2017) Tensor decomposition for signal processing and machine learning. IEEE Trans Signal Process 65(13):3551–3582MathSciNetMATHCrossRef
67.
go back to reference Smilde A, Bro R, Geladi P (2004) Multi-way Analysis: Applications in the Chemical Sciences. John Wiley & Sons, Hoboken, New JerseyCrossRef Smilde A, Bro R, Geladi P (2004) Multi-way Analysis: Applications in the Chemical Sciences. John Wiley & Sons, Hoboken, New JerseyCrossRef
68.
go back to reference Spielman DA, Teng SH (2003) Smoothed analysis of termination of linear programming algorithms. vol 97, pp 375–404, iSMP, 2003 (Copenhagen) Spielman DA, Teng SH (2003) Smoothed analysis of termination of linear programming algorithms. vol 97, pp 375–404, iSMP, 2003 (Copenhagen)
69.
go back to reference Trefethen LN, Bau D (1997) Numerical Linear Algebra. SIAM Trefethen LN, Bau D (1997) Numerical Linear Algebra. SIAM
70.
go back to reference Whitney H (1957) Elementary structure of real algebraic varieties. Ann Math 66(3) Whitney H (1957) Elementary structure of real algebraic varieties. Ann Math 66(3)
Metadata
Title
The Average Condition Number of Most Tensor Rank Decomposition Problems is Infinite
Authors
Carlos Beltrán
Paul Breiding
Nick Vannieuwenhoven
Publication date
08-02-2022
Publisher
Springer US
Published in
Foundations of Computational Mathematics / Issue 2/2023
Print ISSN: 1615-3375
Electronic ISSN: 1615-3383
DOI
https://doi.org/10.1007/s10208-022-09551-1

Other articles of this Issue 2/2023

Foundations of Computational Mathematics 2/2023 Go to the issue

Premium Partner