We address the noncommutative version of the Edmonds’ problem, which asks to determine the inner rank of a matrix in noncommuting variables. We provide an algorithm for the calculation of this inner rank by relating the problem with the distribution of a basic object in free probability theory, namely operator-valued semicircular elements. We have to solve a matrix-valued quadratic equation, for which we provide precise analytical and numerical control on the fixed point algorithm for solving the equation. Numerical examples show the efficiency of the algorithm.
Notes
Communicated by Alan Edelman.
This work was supported by the SFB-TRR 195 “Symbolic Tools in Mathematics and their Application” of the German Research Foundation (DFG).
We thank two referees whose questions and suggestions resulted not only in a total reorganisation of this paper, but also motivated the investigations in [25].
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
1 Introduction
We will address the question how to compute the rank of matrices in noncommuting variables. The classical, commuting, version of this is usually called the Edmonds’ problem and goes back to [9]: given a square matrix, where the entries are linear functions in commuting variables, one should decide whether this matrix is invertible over the field of rational functions; more general, one should calculate the rank of the matrix over the field of rational functions.
What we will consider here is a noncommutative version of this, where commuting variables are replaced by noncommuting ones; the usual rational functions are replaced by noncommutative rational functions (aka free skew field); and the commutative rank is replaced by the inner rank. The free skew field is a quite complicated object; there is no need for us to delve deeper into its theory, as we can and will state the problem in terms of the rank. For more information on the Edmonds’ problem and its noncommutative version we refer to the first section of [14]; in particular, this contains descriptions and references for the various incarnations of the free field, its rank and the equivalent characterizations of the fullness of such noncommutative matrices.
Advertisement
Let us be a bit more precise. Consider a matrix \(A=a_1\otimes \textbf{x}_1+\ldots +a_n\otimes \textbf{x}_n\), where \(\textbf{x}_1,\dots ,\textbf{x}_n\) are noncommuting variables and \(a_1,\dots ,a_n\in M_N(\mathbb {C})\) are arbitrary matrices of the same size N; N is fixed, but arbitrary. The noncommutative Edmonds’ problem asks now whether A is invertible over the noncommutative rational functions in \(\textbf{x}_1,\dots ,\textbf{x}_n\); this is equivalent (by deep results of Cohn, see [7]) to asking whether A is full, i.e., its noncommutative rank is equal to N. The noncommutative rank is here the inner rank \(\operatorname {rank}(A)\), i.e., the smallest integer r such that we can write A as a product of an \(N\times r\) and an \(r\times N\)-matrix. This fullness of the inner rank can also be equivalently decided by a more analytic object (see [14]): to the matrix A from above we associate a completely positive map \(\eta :M_N(\mathbb {C})\rightarrow M_N(\mathbb {C})\), which is defined by \(\eta (b):=\sum _{i=1}^n a_iba_i^*\). In terms of \(\eta \), the fullness condition for A is equivalent to the fact that \(\eta \) is rank non-decreasing (here we have of course the ordinary commutative rank on the complex \(N\times N\)-matrices).
This noncommutative Edmonds’ problem has become quite prominent in recent years and there are now a couple of deterministic algorithms, see [6, 13, 14, 16, 20]. We will provide here another analytic approach to decide the fullness of A, actually to calculate more generally the inner rank of A. Our main point is to give a different perspective on the problem, by relating it with recent progress in free probability theory. For general information on free probability we refer to [24, 32] and Appendix A; various numerical aspects of free probability were especially treated in [11, 27, 28].
We propose here a noncommutative probabilistic approach to the noncommutative Edmonds’ problem, by replacing the formal variables \(\textbf{x}_i\) by concrete operators on infinite-dimensional Hilbert spaces. A particular nice choice for the analytic operators are freely independent semicircular variables \(s_1,\dots ,s_n\), which are the noncommutative analogues of independent Gaussian random variables. In the case where all the \(a_i\) are selfadjoint, the matrix \(S=a_1\otimes s_1+\ldots +a_n\otimes s_n\) is a matrix-valued semicircular element. Since for arbitrary A the inner rank of the \(N\times N\)-matrix A is half the inner rank of the selfadjoint \(2N\times 2N\)-matrix
as one can deduce from Lemma 5.5.3 and Theorem 5.5.4 in [7], it suffices to consider in the following the selfadjoint situation. Then the corresponding S is also a selfadjoint operator, hence its distribution \(\mu _S\) (in the sense of Appendix A.1) is a probability measure on \(\mathbb {R}\).
By recent results of [26] we know that the invertibility of A over the noncommutative rational functions is equivalent to the invertibility of S as an unbounded operator. But this is equivalent to the question whether S has a trivial kernel, which is equivalent to the question whether its distribution \(\mu _S\) has no atom at zero.
Advertisement
The distribution \(\mu _S\) of our matrix-valued semicircular element S can, by results of free probability theory, be described as follows. The Cauchy transform g(z) of \(\mu _S\), that is, the analytic function
in the lower half-plane \(\mathbb {H}^-(M_N(\mathbb {C}))\) of \(M_N(\mathbb {C})\), where \(\textbf{1}\in M_N(\mathbb {C})\) denotes the identity matrix. One should note that for each \(z\in \mathbb {C}^+\) there is (by results of [19]) exactly one solution G(z) in the complex lower half-plane of \(M_N(\mathbb {C})\)).
One knows that \(\mu _S\) can have an atom only at zero, and the inner rank of A is related to the mass of this atom at zero; more precisely,
All those statements are non-trivial and follow from works in the last decade or so on free probability and in particular on analytic realizations of noncommutative formal variables via free semicircular random variables; a precise formulation for this connection will be given in Theorem 4.3.
In order to make concrete use of this relation between the inner rank of A and the size of the atom of \(\mu _S\) at zero we need precise analytical and numerical control on the fixed point algorithm for solving the Eq. (1.3) and we also have to deal with the de-convolution problem of extracting information about atoms of a probability measure from the knowledge of its Cauchy transform on the imaginary axis. In the next section we will give a precise high level description of our algorithm. The details, proofs and examples will then be filled in in later sections.
Whereas the noncommutative Edmonds’ problem can be stated over any field \(\mathbb {F}\) (that is, all \(a_j\in M_N(\mathbb {F})\)), our free probability and Banach space techniques are rooted in the field of complex numbers, thus we will in the following restrict to \(\mathbb {F}=\mathbb {C}\) (or subfields of \(\mathbb {C}\)).
The original problem of calculating the inner rank of our matrix A can be stated as solving a system of quadratic equations, and the matrix Eq. (1.3) above is also not more than a system of quadratic equations for the entries of G. So one might wonder what advantage it brings to trade in one system of quadratic equations for another one. The point is that our system (1.3) has a lot of structure, especially positivity in the background, coming from the free probability interpretation of this setting, and thus we have in particular analytically controllable fixed point iterations to solve those systems.
Apart from this Introduction the paper has three more sections and an appendix. In Sect. 2, we present our algorithm, together with the high-level explanation why it works. Section 3 gives then two concrete numerical examples for the algorithm, one for the full case and one for the non-full case. In the latter example, we show also how one can actually calculate the inner rank in the non-full case by looking on enlarged matrices. Section 4 provides some basic introduction to the relevant theoretical background from free probability theory; in particular, we explain the relation between the algebraic problem of calculating the inner rank and the analytic problem of calculating atoms in the distribution of corresponding matrix-valued semicircular elements. The more advanced technical tools around operator-valued free probability are deferred to Appendix A. In particular, our quite technical estimates and termination statements for the fixed point iteration of the key Eq. (1.3) will appear in Appendix B.
2 The Algorithm
Given \(a_1,\dots ,a_n\in M_N(\mathbb {C})\) we want to calculate the inner rank of the formal matrix \(A=a_1\otimes \textbf{x}_1+\ldots +a_n\otimes \textbf{x}_n\) in noncommuting variables \(\textbf{x}_1,\dots ,\textbf{x}_n\). Since we can only encode rational entries on the computer and those can, by scaling of the matrices, be changed to integers (without affecting the inner rank), we consider matrices over \(\mathbb {Z}[i]=\mathbb {Z}+i\mathbb {Z}\). Such an assumption of integer-valued entries is also relevant for our key theoretical estimate in Corollary 4.9. For simplicity, we restrict our presentation to matrices over \(\mathbb {Z}\), that is, we assume \(a_i\in M_N(\mathbb {Z})\).
As mentioned before, the general case can be reduced to the selfadjoint case, and thus in the following we will also always assume that all \(a_i\) are selfadjoint. In that case, we need the information about the size of the atom at zero of the probability measure \(\mu _S\) of the corresponding operator \(S=a_1\otimes s_1+\ldots +a_n\otimes s_n\). The Cauchy transform g(z) of this measure is given by Eqs. (1.1) and (1.2), by solving the Eq. (1.3), where \(\eta \) is the completely positive map
We will show in Proposition 4.1 that the function \(\theta _S(y):= - y \operatorname {Im}g(iy))\) on \(\mathbb {R}^+:= (0,\infty )\) contains the information about the atom at zero. In particular, this function is monotonically increasing and converges, for \(y\searrow 0\), from above to the size of the atom.
Let us first restrict to the task of deciding whether an atom exists or not, that is, whether the rank is maximal or not. Since, by the relation (1.4) (see also Theorem 4.3), the possible sizes for the atom can only be multiples of 1/N, one can be sure that no atom exists if one has for some \(y_0>0\) that \(\theta _S(y_0)< 1/N\). On the other hand, having \(\theta _S(y_0)\ge 1/N\) does in general not allow to conclude anything, because we might fall below 1/N for some smaller y. However, for the distributions \(\mu _S\) of our matrix-valued semicircular elements the situation is better, since we have in these cases, under the assumption of no atom at zero, some a priori information about the accumulation of the distribution \(\mu _S\) about zero.
By Theorem 4.3 we know that our measure is of regular type, that is, in some neighborhood of 0 one has \(\mu _S([-r,r]))\le c r^\beta \). Though we know by abstract arguments that all our \(\mu _S\) are of regular type, we have no way of determining the values of \(\beta \) or c in concrete cases, nor do we have general estimates on them for all our \(\mu _S\). Hence this regularity information can not be used for choosing a \(y_0\) in our algorithm at the moment. However, we hope that in the future we will be able to derive more precise information about those regularity parameters.
This absence of general control on the regularity parameters c and \(\beta \) had the effect that in the first version of this paper we could not provide a certificate for the termination of our algorithm. Encouraged by the questions of two referees we looked for a more definitive result in this direction and could achieve a recent breakthrough on this in [25]. Namely, instead of relying on the regularity parameters c and \(\beta \), we relate the behaviour of our measure \(\mu _S\) in the neighborhood of 0 with the so-called Fuglede–Kadison determinant \(\Delta (S)\) of our matrix-valued semicircular operator S (see Definition 4.5). We will derive in Corollary 4.7 that if S is invertible as an unbounded operator (which means that the corresponding A has full rank) then we have for any \(\delta >0\):
An upper bound for \(\Vert S\Vert \) and a lower bound for \(\Delta (S)\) will thus lead to a value for \(y_0\). The upper bound \(\Vert S\Vert \le 2\Vert \eta (\textbf{1})\Vert ^{1/2}\) for matrix-valued semicircular elements with covariance mapping \(\eta \) is well-known (see the discussion around Theorem 9.2 in [24]). In contrast to the situation with the regularity parameters, we also have a lower estimate for \(\Delta \) for our matrix-valued semicircular elements; namely in [25] we have shown that in the case where all our coefficient matrices have integer valued entries – that is, \(a_j\in M_N(\mathbb {Z})\) for all j – we have the uniform estimate \(\Delta (S)\ge e^{-1/2}\). This allows us to prescribe a concrete value for \(y_0\) in our algorithm, such that the value of \(\theta _S(y_0)\) decides upon whether we have an atom or not. Namely, if we set
the estimates from above guarantee, in the case \(a_j\in M_N(\mathbb {Z})\) and for \(\delta <2\), that \(\theta _S(y_0)\le \delta \) if S is invertible; see also Corollary 4.9.
In order to get a certificate for the invertibility of S we choose \(\delta =\frac{1}{2N}\) and the corresponding
If we then allow in our iteration algorithm for the calculation of \(\theta _S(y_0)\) an error strictly less than \(\frac{1}{4N}\), we have the following two possibilities for the value of \(\theta _S(y_0)\).
(i)
If S is invertible then, by the arguments above, \(\theta _S(y_0)\le 1/{(2N)}\), and hence our calculated value \({\tilde{\theta }}\) must be strictly less than 3/(4N);
(ii)
if, on the other hand, S is not invertible, then we must have an atom of size at least 1/N, that is \(\theta _S(y_0)\ge 1/N\), and thus in our calculation we must see a value \({\tilde{\theta }}\) which is strictly greater than 3/(4N).
These observations prove the correctness of Algorithm 1.
×
×
One should note that our results guarantee a termination of Algorithm 2 and hence of Algorithm 1, but the number of steps in our fixed point algorithm Algorithm 2 for the solution of the Eq. (1.3) will in the worst case, by Algorithm 2 and the a priori estimates (B.7) in Corollary B.5, be exponential in the matrix size N. However, by Proposition B.6 and the corresponding Corollary B.10 we also have a posteriori estimates, which usually lead to a much earlier termination of the fixed point algorithm. One should also note that numerical instabilities are not an issue of our fixed point algorithm, since (B.3) (see also Corollary B.5) guarantees that our iteration map stays within a compact region and can be bounded away from singularities.
Improving our control of the regularity parameters might lead to better complexity behaviour of the algorithm. The results in [4, 21] give us some hope that progress on the regularity parameters should be possible. Also the numerical simulations indicate that the control of \(\mu _S\) about zero via the Fuglede–Kadison determinant is not optimal. In all our simulations \(\theta _S(y)\) stabilizes to its limiting value for much bigger values of y than the \(y_0\) which is derived from our determinant estimate.
Let us finally extend our task from just deciding whether the rank is maximal or not to the task of actually calculating the rank. Our numerical examples suggest again that the value of \(\theta _S(y_0)\) should have stabilized at its limit value, not only in the case of full rank, but in general. For a rigorous proof of this we would need to extend our estimates of the Fuglede–Kadison determinant to the case where S is not invertible. Of course, then \(\Delta (S)=0\) and thus no direct information can be gotten from this. However, there exists a modified version of \(\Delta \), which ignores the atomic part at zero of the distribution. If we could extend our lower bounds on the determinant to this setting, this would then certify that our calculation of \(\theta _S(y_0)\) does not only distinguish between full and not-full, but does actually give the value of the noncommutative rank directly.
For the moment, however, we have to rely on the usual methods to reduce the calculation of the rank to the problem of deciding fullness. There are different possibilities to do so; we refer to the appendix of [14] for a discussion of this. In the following section we will also present a concrete example how this works in our setting.
3 Examples
All numerical examples in this section were computed using our library NCDist.jl ([17]) for the Julia programming language ([2]).
in three formal noncommuting variables \(\textbf{x}_1,\textbf{x}_2,\textbf{x}_3\). This matrix A has inner rank 3, i.e., it is full. This can be seen by our approach as follows.
According to Theorem 4.3 we have to look for an atom at zero of the analytic distribution \(\mu :=\mu _S\) of the operator-valued semicircular element
is then the Cauchy transform of the wanted analytic distribution \(\mu \), which is depicted in Fig. 1.
×
From this it is apparent that there is no atom at zero. In order to prove this rigorously we rely on Proposition 4.1. Figure 2 shows our calculated approximation \({\tilde{\theta }}\) of the function \(\theta (y):=- y\operatorname {Im}g(iy)\) (with an error of strictly less than 1/12) in logarithmic dependence of the distance y from the real axis.
×
As soon as we fall with \({\tilde{\theta }}\) below \(1/3-1/12=1/4\), we can be sure that there is no atom at zero, and thus the inner rank of A is equal to 3. Note, however, that we do not have to calculate the whole plot as presented in Fig. 2. Our algorithm tells us that, with \(N=3\) and
this position of \(y_0\) is marked as the vertical line in orange in the plot. We calculate the approximation \({\tilde{\theta }}\) of \(\theta (y_0)\) at this position. This value \({\tilde{\theta }}\) is clearly below 1/4 and thus this certifies that we have no atom and thus that the rank is 3. One should note, however, that this value of \(y_0\) is actually much smaller than the value of \(y\approx 10^{-1}\), where we fall with \({\tilde{\theta }}\) below 1/4. This suggests that our estimate for the behaviour of \(\mu \) about zero, using the Fuglede–Kadison determinant, is much too conservative and has potential for improvements relying on control of the regularity parameters.
in four formal noncommuting variables \(\textbf{x}_1,\textbf{x}_2,\textbf{x}_3,\textbf{x}_4\). This matrix A is not full, but has inner rank 2. This can be seen by our approach as follows.
First, we have to decide whether A is full or not. For this we have to decide whether the distribution \(\mu \) of the corresponding matrix-valued semicircular element has an atom at zero or not. Figure 3 shows the plot of our approximations \({\tilde{\theta }}\) for \(\theta (y)\), with an error of strictly less than 1/16. Again, we only have to calculate this for \(y_0\approx 10^{-21}\). Since the value there is bigger than 3/16, this certifies that there is an atom at zero. Since the value is also smaller than \(3/4-1/16\), the size of the atom cannot be 3/4 or bigger. Hence we are left with the two possibilities that the size of the atom is 1/4 or 1/2, that is that the rank of A is either 3 or 2.
×
Though the whole plot and also the value of \({\tilde{\theta }}\) at this \(y_0\) suggest very convincingly that the size of the atom should be 1/2, our present theoretical estimates do not allow us to distinguish between the two cases by just relying on this \({\tilde{\theta }}\).
In order to get a certificate for deciding between the two possibilities we now use the procedure as described in the appendix of [14]. Namely, in order to check whether A has rank 3 we should consider the \(5\times 5\)-matrix, where A is enlarged with an extra row and column with new formal variables, that is we consider
One has that A has rank 3 if and only if B has full rank, that is rank 5. Figure 4 shows \({\tilde{\theta }}\) for this B with the marked value of \(y_0\approx 10^{-26}\). From this we see that B has not full rank (actually we see that it has rank 4), so this means that A cannot have rank 3, and thus it must have rank 2.
×
4 Free Probability as a Bridge Between the Algebraic and the Analytic Problem
4.1 Cauchy Transform and Stieltjes Inversion Formula
An important tool for the study of analytic distributions are Cauchy transforms (also known under the name Stieltjes transforms, especially in random matrix theory, but there usually with a different sign convention). For any finite Borel measure \(\mu \) on the real line \(\mathbb {R}\) (note that “measure” always means “positive measure”), the Cauchy transform of\(\mu \) is defined as the holomorphic function
where \(\mathbb {C}^+\) and \(\mathbb {C}^-\) denote the upper- and lower complex half-plane, respectively, that is, \(\mathbb {C}^\pm := \{z\in \mathbb {C}\mid \pm \operatorname {Im}(z) > 0\}\).
Let \(\mu \) be a finite Borel measure on the real line \(\mathbb {R}\). It is well-known that one can recover \(\mu \) with the help of the Stieltjes inversion formula from its Cauchy transform \(g_\mu \). More precisely, one has that the finite Borel measures \(\mu _\varepsilon \) defined by \(d\mu _\varepsilon (t):= -\frac{1}{\pi } \operatorname {Im}(g_\mu (t+i\varepsilon ))\, dt\) converge weakly to \(\mu \) as \(\varepsilon \searrow 0\). Here, we aim at computing the value \(\mu (\{0\})\) from the knowledge of (arbitrarily good approximations of) \(g_\mu \); while we are mostly interested in the case of probability measures, we cover here the more general case of finite measures. To this end, we define the function
$$\begin{aligned} \theta _\mu :\ \mathbb {R}^+ \rightarrow \mathbb {R}^+,\quad y \mapsto - y \operatorname {Im}(g_\mu (iy)) \end{aligned}$$
on \(\mathbb {R}^+:= (0,\infty )\). Note that \(\theta _\mu (y) = \operatorname {Re}(iy g_\mu (iy))\) for all \(y\in \mathbb {R}^+\).
We start by collecting some properties of \(\theta _\mu \):
Proposition 4.1
Let \(\mu \) be a finite Borel measure on the real line \(\mathbb {R}\). Then the following statements hold true:
From this, we easily deduce the asserted integral representations of \(\theta _\mu \) and \(\theta _\mu '\); the latter tells us that \(\theta _\mu \) is increasing.
(ii)
Since \(\theta _\mu (y)=\operatorname {Re}(iy g_\mu (iy))\) for all \(y\in \mathbb {R}^+\), these assertions are immediate consequences of the well-known facts that \(\lim _{\sphericalangle z\rightarrow \infty } z g_\mu (z) = 1\) and \(\lim _{\sphericalangle z \rightarrow 0} z g_\mu (z) = \mu (\{0\})\). More directly, these statements can be deduced from the integral representation of \(\theta _\mu \) derived in (i) using Lebesgue’s dominated convergence theorem; note that at each point \(t \in \mathbb {R}\), the integrand satisfies \(\frac{y^2}{y^2 + t^2} \rightarrow 1\) as \(y\rightarrow \infty \) and \(\frac{y^2}{y^2 + t^2} \rightarrow \textbf{1}_{\{0\}}(t)\) as \(y \searrow 0\).
(iii)
Follows from combining the monotonicity of \(\theta _\mu \) established in (i) with the limit \(\lim _{y \rightarrow 0} \theta _\mu (y) = \mu (\{0\})\) stated in (ii). Alternatively, one gets this by bounding the integral representation of \(\theta _\mu \) from (i) using \(\frac{y^2}{y^2 + t^2} \ge \textbf{1}_{\{0\}}(t)\).
(iv)
By the inequality between the arithmetic and the geometric mean, we see that \(\frac{2 y t^2}{(y^2+t^2)^2} \le \frac{1}{2y}\) for each \(y\in \mathbb {R}^+\) and all \(t\in \mathbb {R}\). Thus, using the formula derived in (i) and since \(\mu \) is a finite measure, we conclude that \(\theta _\mu '(y) \le \frac{\mu (\mathbb {R})}{2y}\) for all \(y \in \mathbb {R}^+\) as desired.
\(\square \)
In the applications we are interested in, \(\mu \) is a Borel probability measure and we have access to \(\theta _\mu (y)\) at any point \(y\in \mathbb {R}^+\); we want to use this information to compute \(\mu (\{0\})\). We will work in such situations where the possible values of \(\mu (\{0\})\) are limited to a discrete subset of the interval [0, 1]; then, thanks to Proposition 4.1, once we found \(y\in \mathbb {R}^+\) for which \(\theta _\mu (y)\) falls below the smallest positive value that \(\mu (\{0\})\) can attain, we can conclude that \(\mu (\{0\})\) must be zero. For this approach, however, we need some techniques to predict how small y must be to decide reliably whether \(\mu (\{0\})\) is zero or not.
4.2 Distributions of Regular Type
The problem is addressed in the following proposition for Borel probability measures \(\mu \) of regular type, i.e., those which are of the form \(\mu = \mu (\{0\}) \delta _0 + \nu \) for some Borel measure \(\nu \) satisfying
with some \(c \ge 0\), \(\beta \in (0,1]\) and \(r_0>0\); note that \(\nu \) is necessarily finite with \(\nu (\mathbb {R}) = 1-\mu (\{0\})\). We emphasize that (4.1) allows \(c=0\); thus, the class of Borel probability measures of regular type includes those Borel probability measures whose support is contained in \(\mathbb {R}{\setminus } (-r_0,r_0)\) for some \(r_0>0\) except, possibly, an atom at zero.
Proposition 4.2
Let \(\mu \) be a Borel probability measure on \(\mathbb {R}\) which is of regular type such that (4.1) holds. Let \(\gamma :=\frac{2}{2+\beta }\) and \(y_0:=r_0^{1/\gamma }\), then
First of all, we note that \(g_\mu (z) = \mu (\{0\}) \frac{1}{z} + g_\nu (z)\) for all \(z\in \mathbb {C}^+\) and hence \(\theta _\mu (y) = \mu (\{0\}) + \theta _\nu (y)\) for all \(y\in \mathbb {R}^+\).
Now for all \(0<y<y_0\) we have \(r:=y^\gamma <y_0^\gamma =r_0\), so we get
and hence, using Proposition 4.1 (i), that \(\theta _\nu (y) \le (c+\nu (\mathbb {R})) y^{\frac{2\beta }{2+\beta }}\).
Combining both results, we obtain the asserted bound. \(\square \)
4.3 Matrix-valued Semicircular Elements
From [26, 30], we learn that Borel probability measures of regular type arise naturally as analytic distributions of matrix-valued semicircular elements. This class of operators allows us to bridge the gap between the algebraic problem of determining the inner rank of linear matrix pencils over the ring of noncommutative polynomials and the analytic tools originating in free probability theory.
To this end, we need to work in the setting of tracial \(W^*\)-probability spaces. Therefore, we outline some basic facts about von Neumann algebras; fore more details, we refer to [3, 31], for instance. Let us recall that a von Neumann algebra\(\mathcal {M}\) is a unital \(*\)-subalgebra of B(H) which is closed with respect to the weak, or equivalently, with respect to the strong operator topology. Since these topologies are weaker than the topology induced by the norm \(\Vert \cdot \Vert \) on B(H), every von Neumann algebra \(\mathcal {M}\) is in particular a unital \(C^*\)-algebra. We call \((\mathcal {M},\tau )\) a tracial\(W^*\)-probability space if \(\mathcal {M}\subseteq B(H)\) is a von Neumann algebra and \(\tau : \mathcal {M}\rightarrow \mathbb {C}\) a state, which is moreover
normal (i.e., according to the characterization of normality given in [3, Theorem III.2.1.4], the restriction of \(\tau \) to the unit ball \(\{x\in \mathcal {M}:\Vert x\Vert \le 1\}\) of \(\mathcal {M}\) is continuous with respect to the weak, or equivalently, with respect to the strong operator topology),
faithful (i.e., if \(x\in \mathcal {M}\) is such that \(\tau (x^*x) = 0\), then \(x=0\)), and
tracial (i.e., we have \(\tau (xy) = \tau (yx)\) for all \(x,y\in \mathcal {M}\)).
To any selfadjoint noncommutative random variable \(x\in \mathcal {M}\), we associate the Borel probability measure \(\mu _x\) on the real line \(\mathbb {R}\) which is uniquely determined by the requirement that \(\mu \) encodes the moments of x, i.e., that \(\tau (x^k) = \int _\mathbb {R}t^k\, d\mu (t)\) holds for all \(k\in \mathbb {N}_0\); we call \(\mu _x\) the analytic distribution ofx.
Of particular interest are (standard) semicircular elements as they constitute the free counterpart of normally distributed random variables in classical probability theory; more precisely, those are selfadjoint noncommutative random variables \(s\in \mathcal {M}\) whose analytic distribution is the semicircular distribution, i.e., we have \(d\mu _s(t) = \frac{1}{2\pi } \sqrt{4-t^2} \textbf{1}_{[-2,2]}(t)\, dt\). We will in the following consider n freely independent semicircular elements \(s_1,\dots ,s_n\). For the definition of free independence (which is the noncommutative analogue of the notion of independence), see Appendix A.1.
In the next theorem, we summarize the crucial results from [26, 30].
Theorem 4.3
Let \(s_1,\dots ,s_n\) be freely independent standard semicircular elements in some tracial \(W^*\)-probability space \((\mathcal {M},\tau )\). Consider further some selfadjoint matrices \(a_1,\dots ,a_n\) in \(M_N(\mathbb {C})\). We define the operator
which lives in the tracial \(W^*\)-probability space \((M_N(\mathbb {C}) \otimes \mathcal {M}, \operatorname {tr}_N \otimes \tau )\). Then the following statements hold true:
(i)
The analytic distribution \(\mu _S\) of S is of regular type.
(ii)
The only possible values of \(\mu _S(\{0\})\) are \(\{\frac{k}{N} \mid k=0,1,\dots ,N\}\).
(iii)
The inner rank \(\operatorname {rank}(A)\) of the linear pencil
over the ring \(\mathbb {C}\langle \textbf{x}_1,\dots ,\textbf{x}_n\rangle \) of noncommutative polynomials in n formal noncommuting variables \(\textbf{x}_1,\dots ,\textbf{x}_n\) is given by
Before giving the proof of Theorem 4.3, we recall that the Novikov-Shubin invariant\(\alpha (x) \in [0,\infty ] \cup \{\infty ^+\}\) of a positive operator x in some tracial \(W^*\)-probability space \((\mathcal {M},\tau )\) is defined as
if \(\mu _x([0,t]) > \mu _x(\{0\})\) is satisfied for all \(t>0\) and as \(\alpha (x):= \infty ^+\) otherwise. We emphasize that \(\infty ^+\) is nothing but a (reasonable) notation which is used to distinguish the case of an isolated atom at 0.
Proof
(i)
It follows from [30, Theorem 5.4] that the Cauchy transform of \(\mu _{S^2}\), the analytic distribution of the positive operator \(S^2\), is algebraic (which also follows from Anderson’s result on preservation of algebraicity [1]). Hence, [30, Lemma 5.14] yields that the Novikov-Shubin invariant \(\alpha (S^2)\) of \(S^2\) is either a non-zero rational number or \(\infty ^+\).
Let us consider the case \(\alpha (S^2)=\infty ^+\) first. Then 0 is an isolated point of the spectrum of \(S^2\). Using spectral mapping, we infer that 0 is also an isolated point in the spectrum of S. Thus, \(\mu _S\) decomposes as \(\mu _S = \mu _S(\{0\}) \delta _0 + \nu \), where \(\nu \) is supported on \(\mathbb {R}\setminus (-r_0,r_0)\) for some \(r_0>0\) and we conclude that \(\mu _S\) is of regular type.
Now, we consider the case \(\alpha (S^2) \in \mathbb {Q}\cap (0,\infty )\). Choose any \(0< \alpha < \alpha (S^2)\). From the definition of \(\alpha (S^2)\), we infer that \(\mu _{S^2}\) satisfies \(\mu _{S^2}((0,t]) = \mu _{S^2}([0,t]) - \mu _{S^2}(\{0\}) \le t^\alpha \) for sufficiently small t, say for \(0<t<t_0\). Since \(\mu _S([-r,r]) = \mu _{S^2}([0,r^2])\) and \(\mu _S(\{0\}) = \mu _{S^2}(\{0\})\), we infer that \(\mu _S\) satisfies \(\mu _S([-r,r]) - \mu _S(\{0\}) \le r^{2\alpha }\) for all \(0<r<r_0\) with \(r_0:= t_0^{1/2}\). If we set \(\nu := \mu _S - \mu _S(\{0\}) \delta _0\), then \(\nu ([-r,r]) \le r^\beta \) for all \(0<r<r_0\) with \(\beta := 2\alpha \); thus, \(\mu _S\) again is of regular type.
(ii)
This follows by an application of Theorem 1.1 (2) in [30].
(iii)
The formula for the inner rank can be deduced from Theorem 5.21 in [26]; see also Remark 4.4 below.
\(\square \)
Remark 4.4
From [26], we learn that the conclusion of Theorem 4.3 (iii) is not at all limited to n-tuples of freely independent standard semicircular elements. Actually, we may replace \((s_1,\dots ,s_n)\) by any n-tuple \((x_1,\dots ,x_n)\) of selfadjoint operators in a tracial \(W^*\)-probability space \((\mathcal {M},\tau )\) which satisfy the condition \(\Delta (x_1,\dots ,x_n) = n\), where \(\Delta (x_1,\dots ,x_n)\) is the quantity introduced by Connes and Shlyakhtenko in [8, Sect. 3.1.2]. For any such n-tuple \((x_1,\dots ,x_n)\), [26, Theorem 5.21] yields that
where \(A=a_1 \otimes \textbf{x}_1 + \ldots + a_n \otimes \textbf{x}_n\) is a linear pencil over \(\mathbb {C}\langle \textbf{x}_1,\dots ,\textbf{x}_n\rangle \) with selfadjoint coefficients \(a_1,\dots ,a_n \in M_N(\mathbb {C})\) and \(X:=A(x_1,\dots ,x_n)\) is a selfadjoint noncommutative random variable in \((M_N(\mathbb {C}) \otimes \mathcal {M}, \operatorname {tr}_N \otimes \tau )\). For the sake of completeness, though this is not relevant for our purpose, we stress that even the restriction to linear pencils is not necessary as the result remains true for all selfadjoint \(A\in M_N(\mathbb {C}) \otimes \mathbb {C}\langle \textbf{x}_1,\dots ,\textbf{x}_n\rangle \); see [26, Corollary 5.15].
It might appear now that the fact that our matrix-valued semicircular elements S are of regular type solves the problem of determining the mass of the atom of \(\mu _S\) at zero. And indeed, if we know all of c, \(\beta \), and \(r_0\), then we can compute \(\mu _S(\{0\})\) exactly: choose a y such that
Calling on Theorem 4.3 again, we know that \(\mu _S(\{0\})\) has to be a value from the discrete set \(\{\frac{k}{N}\mid k=0,1,\ldots ,N\}\). In particular, \(N\tilde{\theta }\) has distance less than \(\frac{1}{2}\) to the integer \(N\mu _S(\{0\})\). But then \(\mu _S(\{0\})=\frac{1}{N}[N\tilde{\theta }]\), where [m] denotes the integer closest to m.
Applying Theorem 4.3 one last time, we finally get
However, even though we know that our relevant distributions \(\mu _S\) are all of regular type, we have no general control on their parameters. We hope to achieve further progress on these questions in the future. But right now, Proposition 4.2 cannot be used for the determination of a value \(y_0\) such that the calculation of \(\theta (y_0)\) would give a certificate for the size of the atom at zero, or equivalently, for the rank of A. In the next section we will thus try to use another regularity quantity.
4.4 Fuglede–Kadison Determinant
To begin with, let us give the following definition taken from [12].
Definition 4.5
Let \((\mathcal {M},\tau )\) be a tracial \(W^*\)-probability space. For a (not necessarily selfadjoint) operator \(x\in \mathcal {M}\) its Fuglede–Kadison determinant is defined by
where \(|x|:=(x^*x)^{1/2}\) and \(\mu _{|x|}\) denotes the analytic distribution of |x|.
For any operator \(x=x^*\in \mathcal {M}\) which is invertible as an unbounded operator or, equivalently, satisfies the condition \(\mu _x(\{0\})=0\), the Fuglede–Kadison determinant \(\Delta (x)\) enables us to control the behavior of \(\mu _x\) near zero.
Lemma 4.6
Let \(x=x^*\in \mathcal {M}\) with the property \(\mu _x(\{0\})=0\) be given. For all \(0<\varepsilon <\Vert x\Vert \), we have
First, we consider the particular case of an operator \(x \ge 0\) with the property \(\mu _x(\{0\})=0\). Take any \(0<\varepsilon <\Vert x\Vert \). We have
By solving the latter for \(\mu _x((0,\varepsilon ])\), we get the asserted estimate in this case.
For \(x=x^*\) with \(\mu _x(\{0\})=0\), we apply the previous result to \(x^2 \ge 0\); since \(\mu _x([-\varepsilon ,\varepsilon ]) = \mu _{x^2}((0,\varepsilon ^2])\) for all \(\varepsilon >0\), \(\Vert x^2\Vert = \Vert x\Vert ^2\), and \(\Delta (x^2) = \Delta (x)^2\), we see that the asserted estimate holds true in full generality. \(\square \)
Corollary 4.7
Let \(x=x^*\in \mathcal {M}\) with \(\mu _x(\{0\})=0\) and \(\delta >0\) be given.
If \(0< y < \Vert x\Vert \Big (\frac{\Delta (x)}{\Vert x\Vert }\Big )^{2/\delta } \Big (\frac{\delta }{2}\Big )^{1/2}\), then \(\theta _{\mu _x}(y) < \delta \).
Proof
Note that \(\Delta (x) \le \Vert x\Vert \); thus, (i) is an immediate consequence of the estimate provided in Lemma 4.6. For the proof of (ii), we proceed as follows. For \(0< y < \Vert x\Vert \Big (\frac{\Delta (x)}{\Vert x\Vert }\Big )^{2/\delta } \Big (\frac{\delta }{2}\Big )^{1/2}\) and \(0< \varepsilon < \varepsilon _0:= \Vert x\Vert \Big (\frac{\Delta (x)}{\Vert x\Vert }\Big )^{2/\delta }\); by (i), we get like in the proof of Proposition 4.2 that
and hence, by using Proposition 4.1 (i), that \(\theta _{\mu _x}(y) \le \frac{\delta }{2} + \big (\frac{y}{\varepsilon }\big )^2\). By letting \(\varepsilon \nearrow \varepsilon _0\), we obtain \(\theta _{\mu _x}(y) \le \frac{\delta }{2} + \big (\frac{y}{\varepsilon _0}\big )^2 < \delta \), as asserted. \(\square \)
The usefulness of the Fuglede–Kadison determinant in our context comes from the following uniform lower estimate for \(\Delta (S)\) from [25]:
Theorem 4.8
(Corollary 1.4, [25]) For integer matrices \(a_j\in M_N(\mathbb {Z})\), consider the matrix-valued semicircular operator \(S=\sum _{i=1}^n a_i\otimes s_i\). If S is invertible as an unbounded operator, then we have for its Fuglede–Kadison determinant that \(\Delta (S)\ge e^{-\frac{1}{2}}\).
Combining Corollary 4.7 with Theorem 4.8 and the fact [24] that \(\Vert S\Vert \le 2\Vert \eta \Vert ^{1/2}\) (and that \(\Vert \eta \Vert =\Vert \eta (\textbf{1})\Vert \) for the positive map \(\eta \)) gives then our following key estimate.
Corollary 4.9
Consider, for selfadjoint integer matrices \(a_j\in M_N(\mathbb {Z})\), the matrix-valued semicircular operator \(S=\sum _{i=1}^n a_i\otimes s_i\). Consider \(0<\delta <2\) and put
If S is invertible as an unbounded operator, then \(\theta _S(y_0)\le \delta \).
Let us summarize our observations. By Theorem 4.3, the problem of determining the inner \(\operatorname {rank}(A)\) of a selfadjoint linear pencil \(A = a_1 \otimes \textbf{x}_1 + \ldots + a_n \otimes \textbf{x}_n\) in \(M_N(\mathbb {C}) \otimes \mathbb {C}\langle \textbf{x}_1,\dots ,\textbf{x}_n\rangle \) has been reduced to the computation of \(\mu _S(\{0\}) \in \{\frac{k}{N} \mid k=0,1,\dots ,N\}\) for the matrix-valued semicircular element \(S = a_1 \otimes s_1 + \ldots + a_n \otimes s_n\). Corollary 4.7 yields upper bounds for \(\mu _S(\{0\})\) in terms of \(\theta _{\mu _S}(y)\) for sufficiently small \(y>0\). By definition of \(\theta _{\mu _S}\), we thus have to compute \(g_{S}\) — or at least approximations thereof with good control on the approximation error. This goal is achieved in the Appendix B for general operator-valued semicircular elements by making use of results of [19]. Note that in the particular case of matrix-valued semicircular elements S which we addressed above, one could alternatively use the more general operator-valued subordination techniques from [5, 18, 22] since \((a_1\otimes s_1,\dots ,a_n \otimes s_n)\) are freely independent with amalgamation over \(M_N(\mathbb {C})\); this approach, however, becomes computationally much more expensive as n grows.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
A Brief Introduction to Operator-Valued Free Probability Theory
Free probability theory, both in the scalar- and the operator-valued case, uses the language of noncommutative probability theory. While many fundamental concepts can be discussed already in some purely algebraic framework, it is more appropriate for our purposes to work in the setting of \(C^*\)-probability spaces. Thus, we stick right from the beginning to this framework. For a comprehensive introduction to the subject of \(C^*\)-algebras, we refer for instance to [3, Chapter II] and the references listed therein; for the readers’ convenience, we recall the terminology used in the sequel.
By a \(C^*\)-algebra, we mean a (complex) Banach \(*\)-algebra \(\mathcal {A}\) in which the identity \(\Vert x^*x\Vert = \Vert x\Vert ^2\) holds for all \(x\in \mathcal {A}\); if \(\mathcal {A}\) has a unit element \(\textbf{1}\), we call \(\mathcal {A}\) a unital\(C^*\)-algebra. A continuous linear functional \(\phi : \mathcal {A}\rightarrow \mathbb {C}\) which is positive (meaning that \(\phi (x^*x) \ge 0\) holds for all \(x\in \mathcal {A}\)) and satisfies \(\Vert \phi \Vert = 1\) is called a state. According to the Gelfand-Naimark theorem (see [3, Corollary II.6.4.10]), every \(C^*\)-algebra \(\mathcal {A}\) admits an isometric \(*\)-representation on some associated Hilbert space \((H,\langle \cdot ,\cdot \rangle )\); hence \(\mathcal {A}\) can be identified with a norm-closed \(*\)-subalgebra of B(H). Each vector \(\xi \in H\) of length \(\Vert \xi \Vert =1\) induces then a state \(\phi : \mathcal {A}\rightarrow \mathbb {C}, x\mapsto \langle x \xi ,\xi \rangle \); such prototypical states are called vector states.
The Scalar-Valued Case
A \(C^*\)-probability space is a tuple \((\mathcal {A},\phi )\) consisting of a unital \(C^*\)-algebra \(\mathcal {A}\) and some distinguished state \(\phi : \mathcal {A}\rightarrow \mathbb {C}\) to which we shall refer as the expectation on \(\mathcal {A}\); elements of \(\mathcal {A}\) will be called noncommutative random variables.
Let \((\mathcal {A}_i)_{i\in I}\) be a family of unital subalgebras of \(\mathcal {A}\). We say that \((\mathcal {A}_i)_{i\in I}\) are freely independent if \(\phi (x_1 \cdots x_n)=0\) holds for every choice of a finite number \(n\in \mathbb {N}\) of elements \(x_1,\dots ,x_n\) with \(x_j \in \mathcal {A}_{i_j}\) and \(\phi (x_j)=0\) for \(j=1,\dots ,n\), where \(i_1,\dots ,i_n \in I\) are indices satisfying \(i_1 \ne i_2, i_2 \ne i_3, \ldots , i_{n-1} \ne i_n\). A family \((x_i)_{i\in I}\) of noncommutative random variables in \(\mathcal {A}\) is said to be freely independent if the unital subalgebras generated by the \(x_i\)’s are freely independent in the aforementioned sense.
To any selfadjoint noncommutative random variable \(x\in \mathcal {A}\), we associate the Borel probability measure \(\mu _x\) on the real line \(\mathbb {R}\) which is uniquely determined by the requirement that \(\mu _x\) encodes the moments ofx, meaning that \(\phi (x^k) = \int _\mathbb {R}t^k\, d\mu _x(t)\) holds true for all \(k\in \mathbb {N}_0\); we call \(\mu _x\) the (analytic) distribution ofx.
For the Cauchy transform \(g_{\mu _x}\) of \(\mu _x\) as defined in Sect. 4.1, we have \(g_{\mu _x}(z) = \phi ((z\textbf{1}- x)^{-1})\) for all \(z\in \mathbb {C}^+\). Therefore, we shall write \(g_x\) for \(g_{\mu _x}\) and refer to it as the Cauchy transform ofx.
Of particular interest are (standard) semicircular elements as they constitute the free counterpart of normally distributed random variables in classical probability theory; more precisely, those are selfadjoint noncommutative random variables \(s\in \mathcal {A}\) whose analytic distribution is the semicircular distribution, i.e., we have \(d\mu _s(t) = \frac{1}{2\pi } \sqrt{4-t^2} \textbf{1}_{[-2,2]}(t)\, dt\).
The Operator-Valued Case
Roughly speaking, the step from the scalar- to the operator-valued case is done by allowing an arbitrary unital \(C^*\)-algebra \(\mathcal {B}\) of \(\mathcal {A}\) which is unitally embedded in \(\mathcal {A}\) to take over the role of the complex numbers. Formally, an operator-valued\(C^*\)-probability space is a triple \((\mathcal {A},\mathbb {E},\mathcal {B})\) consisting of a unital \(C^*\)-algebra \(\mathcal {A}\), a \(C^*\)-subalgebra \(\mathcal {B}\) of \(\mathcal {A}\) containing the unit element \(\textbf{1}\) of \(\mathcal {A}\), and a conditional expectation\(\mathbb {E}: \mathcal {A}\rightarrow \mathcal {B}\); the latter means that \(\mathbb {E}\) is positive (in the sense that \(\mathbb {E}\) maps positive elements in \(\mathcal {A}\) to positive elements in \(\mathcal {B}\)) and further satisfies \(\mathbb {E}[b] = b\) for all \(b\in \mathcal {B}\) and \(\mathbb {E}[b_1 x b_2] = b_1 \mathbb {E}[x] b_2\) for each \(x\in \mathcal {A}\) and all \(b_1,b_2\in \mathcal {B}\).
Also the notion of free independence admits a natural extension to the operator-valued setting. Let \((\mathcal {A}_i)_{i\in I}\) be a family of unital subalgebras of \(\mathcal {A}\) with \(\mathcal {B}\subseteq \mathcal {A}_i\) for \(i\in I\). We say that \((\mathcal {A}_i)_{i\in I}\) are freely independent with amalgamation over\(\mathcal {B}\) if \(\mathbb {E}[x_1 \cdots x_n]=0\) holds for every choice of a finite number \(n\in \mathbb {N}\) of elements \(x_1,\dots ,x_n\) with \(x_j \in \mathcal {A}_{i_j}\) and \(\mathbb {E}[x_j]=0\) for \(j=1,\dots ,n\), where \(i_1,\dots ,i_n \in I\) are indices satisfying \(i_1 \ne i_2\), \(i_2 \ne i_3\), ...,\(i_{n-1} \ne i_n\). A family \((x_i)_{i\in I}\) of noncommutative random variables in \(\mathcal {A}\) is said to be freely independent with amalgamation over\(\mathcal {B}\) if the unital subalgebras generated by \(\mathcal {B}\) and the \(x_i\)’s are freely independent in the aforementioned sense.
In contrast to the scalar-valued case where it was possible to capture the moments of a single selfadjoint noncommutative random variables by a Borel probability measure on \(\mathbb {R}\), such a handy description fails in the generality of operator-valued \(C^*\)-probability spaces. Therefore, we take a more algebraic position. Let \(\mathcal {B}\langle \textbf{x}\rangle \) be the \(*\)-algebra freely generated by \(\mathcal {B}\) and a formal selfadjoint variable \(\textbf{x}\). We shall refer to the \(\mathcal {B}\)-bimodule map \(\mu _x: \mathcal {B}\langle \textbf{x}\rangle \rightarrow \mathcal {B}\) determined by
$$\begin{aligned} \mu _x(\textbf{x}b_1 \textbf{x}\cdots \textbf{x}b_{k-1} \textbf{x}):= \mathbb {E}[x b_1 x \cdots x b_{k-1} x] \end{aligned}$$
as the \(\mathcal {B}\)-valued noncommutative distribution ofx.
Let \(\eta : \mathcal {B}\rightarrow \mathcal {B}\) be a positive linear map. A selfadjoint noncommutative random variable s in \((\mathcal {A},\mathbb {E},\mathcal {B})\) is called centered\(\mathcal {B}\)-valued semicircular element with covariance\(\eta \) if
for all \(k\in \mathbb {N}\) and \(b_0,b_1,\dots ,b_k\in \mathcal {B}\), where \(\operatorname {NC}_2(k)\) denotes the set of all non-crossing pairings on \(\{1,\dots ,k\}\) and \(\eta _\pi : \mathcal {B}^{k+1} \rightarrow \mathcal {B}\) is given by applying \(\eta \) to its arguments according to the block structure of \(\pi \in \operatorname {NC}_2(k)\). Note that \(\operatorname {NC}_2(k)\) is empty if k is odd; consequently, \(\mu _s(b_1 \textbf{x}b_1 \textbf{x}\cdots \textbf{x}b_{k-1} \textbf{x}b_k) = 0\) whenever k is odd. For even k, we can compute \(\eta _\pi \) in a recursive way: every \(\pi \in \operatorname {NC}_2(k)\) contains a block of the form \((p,p+1)\) for \(1\le p \le k-1\); if \(\pi ' \in \operatorname {NC}_2(k-2)\) is obtained from \(\pi \) by removing the block \((p,p+1)\), then
More generally, a \(\mathcal {B}\)-valued semicircular element with covariance\(\eta \) means a selfadjoint noncommutative random variable s in \((\mathcal {A},\mathbb {E},\mathcal {B})\) for which its centered version \(s^\circ := s - \mathbb {E}[s]\) gives a centered \(\mathcal {B}\)-valued semicircular element with covariance \(\eta \).
Example A.1
Let \(s_1,\dots ,s_n\) be freely independent standard semicircular elements living in some \(C^*\)-probability space \((\mathcal {A},\phi )\). Furthermore, let \(a_1,\dots ,a_n \in M_N(\mathbb {C})\) be selfadjoint. In the operator-valued \(C^*\)-probability space \((M_N(\mathbb {C}) \otimes \mathcal {A}, \mathbb {E}, M_N(\mathbb {C}))\), where \(\mathbb {E}\) denotes the conditional expectation from \(M_N(\mathbb {C}) \otimes \mathcal {A}\) to \(M_N(\mathbb {C}) \subseteq M_N(\mathbb {C}) \otimes \mathcal {A}\) which is given by \(\mathbb {E}:= \operatorname {id}_{M_N(\mathbb {C})} \otimes \phi \), we consider the noncommutative random variable
We find that S is an \(M_N(\mathbb {C})\)-valued semicircular element with covariance \(\eta : M_N(\mathbb {C}) \rightarrow M_N(\mathbb {C})\) given by \(\eta (b) = \sum ^n_{i=1} a_i b a_i\) and mean \(\mathbb {E}[S] = a_0\).
Notice that if s is a (centered) \(\mathcal {B}\)-valued semicircular element in \((\mathcal {A},\mathbb {E},\mathcal {B})\) with covariance \(\eta \), then we have in particular \(\eta (b) = \mu _s(\textbf{x}b \textbf{x}) = \mathbb {E}[sbs]\) for every \(b\in \mathcal {B}\). This shows that \(\eta : \mathcal {B}\rightarrow \mathcal {B}\) must be completely positive, that is, not only \(\eta \) but all its ampliations \(\operatorname {id}_{M_n(\mathbb {C})} \otimes \eta : M_n(\mathbb {C}) \otimes \mathcal {B}\rightarrow M_n(\mathbb {C}) \otimes \mathcal {B}\) for \(n\in \mathbb {N}\) are positive linear maps.
Cauchy transforms can also be generalized to the operator-valued realm where they provide an enormously useful analytic tool. For any selfadjoint noncommutative random variable \(x\in \mathcal {A}\), we define the \(\mathcal {B}\)-valued Cauchy transform ofx by
where \(\mathbb {H}^+(\mathcal {B})\) and \(\mathbb {H}^-(\mathcal {B})\) are the upper and lower half-plane in \(\mathcal {B}\), respectively, i.e., \(\mathbb {H}^\pm (\mathcal {B}):= \{b\in \mathcal {B}\mid \exists \varepsilon >0: \pm \operatorname {Im}(b) \ge \varepsilon \textbf{1}\}\) where we set \(\operatorname {Im}(b):= \frac{1}{2i} (b - b^*)\); notice that \(b\in \mathcal {B}\) belongs to \(\mathbb {H}^\pm (\mathcal {B})\) if and only if \(\pm \operatorname {Im}(b)\) is an invertible positive element in \(\mathcal {B}\). We recall that we have
for all \(b\in \mathbb {H}^+(\mathcal {B})\). Whenever the underlying \(C^*\)-algebra \(\mathcal {B}\) is clear from the context, we will simply write \(G_x\) instead of \(G_x^\mathcal {B}\).
If \(\phi \) is any state on \(\mathcal {B}\), then \((\mathcal {A},\mathbb {E},\mathcal {B})\) induces the scalar-valued \(C^*\)-probability space \((\mathcal {A},\phi \circ \mathbb {E})\). Accordingly, each operator-valued noncommutative random variable \(x=x^*\in \mathcal {A}\) can also be viewed as a scalar-valued noncommutative random variable; its scalar-valued Cauchy transform \(g_x: \mathbb {C}^+ \rightarrow \mathbb {C}^-\) is determined by its \(\mathcal {B}\)-valued Cauchy transform \(G_x: \mathbb {H}^+(\mathcal {B}) \rightarrow \mathbb {H}^-(\mathcal {B})\) through \(g_x(z) = \phi (G_x(z\textbf{1}))\) for all \(z\in \mathbb {C}^+\).
We know from [29, Theorem 4.1.12] that the \(\mathcal {B}\)-valued Cauchy transform \(G_s: \mathbb {H}^+(\mathcal {B}) \rightarrow \mathbb {H}^-(\mathcal {B})\) of a centered operator-valued semicircular element with covariance \(\eta \) solves the equation
From [19], we further learn that this equation uniquely determines \(G_s\) among all functions defined on \(\mathbb {H}^+(\mathcal {B})\) and taking values in \(\mathbb {H}^-(\mathcal {B})\).
Even though we are mostly interested in computing the analytic distribution \(\mu _s\) of s, the Eq. (B.1) indicates that it is beneficial to approach this scalar-valued problem through operator-valued free probability theory.
Approximations of Cauchy Transforms for Operator-Valued Semicircular Elements
Let s be a (not necessarily centered) \(\mathcal {B}\)-valued semicircular element in some operator-valued \(C^*\)-probability space \((\mathcal {A},\mathbb {E},\mathcal {B})\) with covariance \(\eta : \mathcal {B}\rightarrow \mathcal {B}\) and mean \(\mathbb {E}[s]\).
Note that for the purpose of this paper, it would be sufficient to treat the particular case of operator-valued \(W^*\)-probability spaces of the form \((M_N(\mathbb {C}) \otimes \mathcal {M}, \mathbb {E}, M_N(\mathbb {C}))\) with \(\mathbb {E}:=\operatorname {id}_{M_M(\mathbb {C})} \otimes \tau \) for a tracial \(W^*\)-probability space \((\mathcal {M},\tau )\) and semicircular elements \(S=a_1 \otimes s_1 + \ldots + a_n \otimes s_n\) like in Theorem 4.3. However, the much more general case of operator-valued \(C^*\)-probability spaces can be treated without any additional effort and we believe that these results are of interest beyond the concrete application in the context of this paper.
We aim at finding a way to numerically approximate the \(\mathcal {B}\)-valued Cauchy transform \(G_s\) of s with good control on the approximation error.
To this end, we build upon the iteration scheme presented in [19], the details of which we shall recall in Appendix B.1. In Appendix B.2, we extract from [19] and the proof of the Earle-Hamilton Theorem [10] (see also [15]) on which their approach relies an a priori bound allowing us to estimate the number of iteration steps needed to reach the desired accuracy. In Appendix B.3, we prove an a posteriori bound for the approximation error providing a termination condition which turns out to be much more appropriate for practical purposes; to this end, we study how the defining Eq. (A.4) behaves under “sufficiently small” perturbations.
Note that it suffices to consider the case of centered \(\mathcal {B}\)-valued semicircular elements; indeed, the given s yields a centered \(\mathcal {B}\)-valued semicircular element \(s^\circ := s - \mathbb {E}[s]\) with the same covariance \(\eta \) whose \(\mathcal {B}\)-valued Cauchy transforms is related to that of s by \(G_s(b) = G_{s^\circ }(b-\mathbb {E}[s])\) for all \(b\in \mathbb {H}^+(\mathcal {B})\). Throughout the following subsections, we thus suppose that \(\mathbb {E}[s]=0\); only in the very last Appendix B.4, where the results derived in the the preceding subsections are getting combined to estimate \(\mu _s(\{0\})\), we return to the general case.
A Fixed Point Iteration for \(G_s\)
In [19] it was shown (actually under some weaker hypothesis regarding \(\eta \)) that approximations of \(G_s\) can be obtained via a certain fixed point iteration (for a slightly modified function in place of \(G_s\)) which is built upon the characterizing Eq. (A.4). In this way, the function \(G_s\) becomes easily accessible to numerical computations.
(Proposition 3.2, [19]) Let \(\mathcal {B}\) be a unital \(C^*\)-algebra and let \(\eta : \mathcal {B}\rightarrow \mathcal {B}\) be a positive linear map (not necessarily completely positive). Fix \(b\in \mathcal {B}\). We define a holomorphic function \(h_b: \mathbb {H}^-(\mathcal {B}) \rightarrow \mathbb {H}^-(\mathcal {B})\) by \(h_b(w):= (b - \eta (w))^{-1}\) for every \(w\in \mathbb {H}^-(\mathcal {B})\). Then, \(h_b\) has a unique fixed point \(w_*\) in \(\mathbb {H}^-(\mathcal {B})\) and, for every \(w_0\in \mathbb {H}^-(\mathcal {B})\), the sequence \((h_b^n(w_0))_{n=0}^\infty \) of iterates converges to \(w_*\).
Note that the formulation above differs from that in [19] as we prefer to perform the iteration in \(\mathbb {H}^-(\mathcal {B})\) and not in the right half-plane of \(\mathcal {B}\), i.e., \(\{b\in \mathcal {B}\mid \exists \varepsilon >0: \operatorname {Re}(b) \ge \varepsilon \textbf{1}\}\), where \(\operatorname {Re}(b):= \frac{1}{2}(b+b^*)\).
Clearly, \(w\in \mathbb {H}^-(\mathcal {B})\) is a fixed point of \(h_b\) precisely when it solves
$$\begin{aligned} b w = \textbf{1}+ \eta (w) w. \end{aligned}$$
(B.1)
Thus, for every fixed \(b\in \mathbb {H}^+(\mathcal {B})\), we obtain from Theorem B.1 that the Cauchy transform \(G_s\) yields by \(w=G_s(b)\) the unique solution of (B.1) and that \(G_s(b) = \lim _{n\rightarrow \infty } h_b^n(w_0)\) for any \(w_0 \in \mathbb {H}^-(\mathcal {B})\). In particular, as asserted above, the \(\mathcal {B}\)-valued Cauchy transform \(G_s: \mathbb {H}^+(\mathcal {B}) \rightarrow \mathbb {H}^-(\mathcal {B})\) is uniquely determined by (A.4) among all functions on \(\mathbb {H}^+(\mathcal {B})\) with values in \(\mathbb {H}^-(\mathcal {B})\).
Since our goal is to quantitatively control the approximation error for the iteration scheme presented in Theorem B.1, we must take a closer look at its proof as given in [19]. The key ingredient is an important result about fixed points of holomorphic functions between subsets of complex Banach spaces. Before giving the precise statement, let us introduce some terminology. Let \((E,\Vert \cdot \Vert )\) be a (complex) Banach space. A non-empty subset S of E is said to be bounded if there exists \(r>0\) such that \(\Vert x\Vert \le r\) for all \(x\in S\). A subset \(\mathcal {D}\) of E is called domain if it is open and connected. Further, if \(\mathcal {D}\) is an open subset of E and S any non-empty subset of \(\mathcal {D}\), we say that Slies strictly inside\(\mathcal {D}\) if \(\operatorname {dist}(S,E \setminus \mathcal {D}) > 0\).
Theorem B.2
(Earle-Hamilton Theorem, [10]) Let \(\mathcal {D}\) be a non-empty domain in some complex Banach space \((E,\Vert \cdot \Vert )\). Suppose that \(h: \mathcal {D}\rightarrow \mathcal {D}\) is a holomorphic function for which \(h(\mathcal {D})\) is bounded and lies strictly inside \(\mathcal {D}\). Then h has in \(\mathcal {D}\) a unique fixed point \(w_*\in \mathcal {D}\) and, for every initial point \(w_0\in \mathcal {D}\), the sequence \((h^n(w_0))_{n=1}^\infty \) of iterates converges to \(w_*\).
In order to apply Theorem B.2, the authors of [19] established, for any fixed \(b\in \mathcal {B}\) and for each \(r > \Vert \operatorname {Im}(b)^{-1}\Vert \), that the holomorphic mapping \(h_b: \mathbb {H}^-(\mathcal {B}) \rightarrow \mathbb {H}^-(\mathcal {B})\) restricts to a mapping \(h_b: \mathcal {D}_r \rightarrow \mathcal {D}_r\) of the bounded domain \(\mathcal {D}_r:= \{ w \in \mathbb {H}^-(\mathcal {B}) \mid \Vert w\Vert < r\}\) and that \(h_b(\mathcal {D}_r)\) lies strictly inside \(\mathcal {D}_r\). More precisely, they have shown that
When performing the fixed point iteration in Theorem B.1 as proposed in [19], it clearly is desirable to know a priori how many iteration steps are needed in order to reach an approximation with an error lying below some prescribed threshold. To settle this question, we have to delve into the details of the proof of Theorem B.2 on which Theorem B.1 relies. In doing so, we follow the excellent exposition given in [15].
Let \(\mathcal {D}\) be a non-empty domain in a complex Banach space \((E,\Vert \cdot \Vert )\) and let \(h: \mathcal {D}\rightarrow \mathcal {D}\) be a bounded holomorphic map for which \(h(\mathcal {D})\) lies strictly inside \(\mathcal {D}\), say \(\operatorname {dist}(h(\mathcal {D}), E {\setminus } \mathcal {D}) \ge \varepsilon \) for some \(\varepsilon > 0\). The core idea in the proof of Theorem B.2 is to show that h forms a strict contraction with respect to the Carathéodory-Riffen-Finsler pseudometric (for short, CRF-pseudometric) \(\rho \) on \(\mathcal {D}\), provided that \(\mathcal {D}\) is bounded. The latter restriction, however, is not problematic since one can always replace \(\mathcal {D}\) by the domain \(\widetilde{\mathcal {D}}:= \bigcup _{x\in \mathcal {D}} B_\varepsilon (h(x))\), which is bounded due to the boundedness of h. (For bounded \(\mathcal {D}\), the CRF-pseudometric \(\rho \) is even a metric; see (B.6) below.) Thus, we will assume from now on that \(\mathcal {D}\) is bounded. With no loss of generality, we may also suppose that the initial point \(w_0\) for the iteration lies in \(\widetilde{\mathcal {D}}\); otherwise, we just replace \(w_0\) by \(h(w_0)\) and consider the truncated sequence of iterates.
We recall that the CRF-pseudometric \(\rho \) is defined as
where \(\Gamma (x,y)\) stands for the set of all piecewise continuously differentiable curves \(\gamma : [0,1] \rightarrow \mathcal {D}\) with \(\gamma (0)=x\) and \(\gamma (1)=y\) and where \(L(\gamma )\) denotes the length of \(\gamma \in \Gamma (x,y)\); the latter is defined as
with \(\mathbb {D}:= \{z\in \mathbb {C}\mid |z| < 1\}\).
Since we assumed that \(\mathcal {D}\) is bounded, both \(\mathcal {D}\) and \(h(\mathcal {D})\) have finite diameter and one finds that h satisfies, for any fixed \(0 < t \le \operatorname {diam}(h(\mathcal {D}))^{-1} \varepsilon \),
(The strongest bound among these is obtained, of course, for the particular choice \(t=\operatorname {diam}(h(\mathcal {D}))^{-1} \varepsilon \), but for later use, we prefer to keep this flexibility.) With \(q:= \frac{1}{1+t}\), we thus get by Banach’s contraction mapping theorem the a priori bound
The final step in the proof of Theorem B.2 consists in the observation that \(\rho \) compares to the metric induced by the norm \(\Vert \cdot \Vert \) like
Indeed, combining (B.6) with (B.4), one concludes that the sequence of iterates \((h^n(w_0))_{n=0}^\infty \) must converge to \(w_*\) with respect to \(\Vert \cdot \Vert \).
In the same way as (B.4) yields in combination with (B.6) a bound on \(\Vert h^n(w_0)-w_*\Vert \) in terms of \(\rho (h(w_0),w_0)\), we may derive from (B.5) with the help of (B.6) a bound on \(\Vert h^{n+1}(w_0)-w_*\Vert \) in terms of \(\rho (h^{n+1}(w_0),h^n(w_0))\). From our practical point of view, the involvement of \(\rho \) is somewhat unfavorable; we aim at making these bounds explicit in the sense that they only depend on controllable quantities. This is achieved by the following lemma, which takes its simplest form in the particular case of convex domains \(\mathcal {D}\).
Lemma B.3
Let \(\mathcal {D}\) be a non-empty domain in some complex Banach space \((E,\Vert \cdot \Vert )\) and let \(h: \mathcal {D}\rightarrow \mathcal {D}\) be a holomorphic map for which \(h(\mathcal {D})\) lies strictly inside \(\mathcal {D}\), say \(\operatorname {dist}(h(\mathcal {D}), E {\setminus } \mathcal {D}) \ge \varepsilon > 0\), and which has the property that \(\sup _{w\in \mathcal {D}} \Vert (Dh)(w)\Vert \le M < \infty \). Then, for all \(x,y\in \mathcal {D}\), we have that
We claim that \(\alpha (x,v) \le \operatorname {dist}(x,E \setminus \mathcal {D})^{-1} \Vert v\Vert \) for each \((x,v) \in \mathcal {D}\times E\). Obviously, it suffices to treat the case \(v\ne 0\). For each holomorphic \(g: \mathcal {D}\rightarrow \mathbb {D}\), we define a holomorphic function \(g_{x,v}: D(0,r) \rightarrow \mathbb {D}\) on the open disc \(D(0,r) = \{z\in \mathbb {C}\mid |z| < r\}\) by \(g_{x,v}(z):= g(x+zv)\), where r is chosen such that \(\{x+zv \mid z\in D(0,r)\} \subseteq \mathcal {D}\). Note that we may take \(r = \operatorname {dist}(x,E {\setminus } \mathcal {D}) \Vert v\Vert ^{-1}\); by the Cauchy estimates, we thus find that
The asserted bound for \(\alpha (x,v)\) follows from the latter by taking the supremum over all holomorphic functions \(g: \mathcal {D}\rightarrow \mathbb {D}\). (We point out that the proof actually shows that \(\Vert (Dg)(x)\Vert \le \operatorname {dist}(x,E {\setminus } \mathcal {D})^{-1}\) for all \(x\in \mathcal {D}\).)
(2)
Let \(x,y\in \mathcal {D}\) be given. For every \(\gamma \in \Gamma (x,y)\), we put \(\gamma ^*:= \gamma ([0,1])\) and infer from the bound derived in (1) that
Again, let \(x,y\in \mathcal {D}\) be given and take any \(\gamma _0 \in \Gamma (x,y)\). Since h is a smooth self-map of \(\mathcal {D}\), the curve \(\gamma := h \circ \gamma _0\) belongs to \(\Gamma (h(x),h(y))\), and because we have \(\gamma ^*\subset h(\mathcal {D})\), the assumption of strict inclusion of \(h(\mathcal {D})\) in \(\mathcal {D}\) guarantees that \(\operatorname {dist}(\gamma ^*,E \setminus \mathcal {D}) \ge \varepsilon \). We conclude with the help of (2) that
By the chain rule, we have \(\gamma '(t) = (D h)(\gamma _0(t)) \gamma _0'(t)\), and by using the assumption of boundedness of Dh, we infer that \(\Vert \gamma '(t)\Vert \le M \Vert \gamma _0'(t)\Vert \) for all \(t\in [0,1]\). Therefore, we may deduce from the previous bound that
As \(\gamma _0\in \Gamma (x,y)\) was arbitrary, taking the infimum over all \(\gamma _0\) yields the first bound asserted in the lemma.
(4)
Suppose that \(\mathcal {D}\) is convex. For \(x,y\in \mathcal {D}\), the curve \(\gamma _0: [0,1] \rightarrow E\) given by \(\gamma _0(t):= ty + (1-t)x\) belongs then to \(\Gamma (x,y)\). Since \(\int ^1_0 \Vert \gamma _0'(t)\Vert \, dt = \Vert x-y\Vert \), the additional assertion follows from the bound established in (3).
\(\square \)
By combining Lemma B.3 with (B.6) and the bounds (B.4) and (B.5) (noting that \(\operatorname {diam}(h(\mathcal {D})) \le \operatorname {diam}(\mathcal {D})\)), we immediately get the following result.
Proposition B.4
Let \(\mathcal {D}\) be a non-empty bounded convex domain in a complex Banach space \((E,\Vert \cdot \Vert )\) and let \(h: \mathcal {D}\rightarrow \mathcal {D}\) be a holomorphic map for which \(h(\mathcal {D})\) lies strictly inside \(\mathcal {D}\), say \(\operatorname {dist}(h(\mathcal {D}), E {\setminus } \mathcal {D}) \ge \varepsilon > 0\), and which has the property that \(\sup _{w\in \mathcal {D}} \Vert (Dh)(w)\Vert \le M < \infty \). Put \(q:= (1+\frac{\varepsilon }{\operatorname {diam}(\mathcal {D})})^{-1}\) and denote by \(w_*\) the unique fixed point of h on \(\mathcal {D}\). Then, for each \(w_0\in h(\mathcal {D})\), we have that
We apply this result in the particular setting of Theorem B.1.
Corollary B.5
Let \(\mathcal {B}\) be a unital \(C^*\)-algebra and let \(\eta : \mathcal {B}\rightarrow \mathcal {B}\) be a positive linear map (not necessarily completely positive). Fix \(b\in \mathcal {B}\) and let \(w_*\) be the unique fixed point of the holomorphic function \(h_b: \mathbb {H}^-(\mathcal {B}) \rightarrow \mathbb {H}^-(\mathcal {B})\) which is defined by \(h_b(w):= (b - \eta (w))^{-1}\) for every \(w\in \mathbb {H}^-(\mathcal {B})\) (equivalently, \(w_*\) is the unique solution of (B.1) in \(\mathbb {H}^-(\mathcal {B})\), or in other words, \(w_*=G_s(b)\) for any centered \(\mathcal {B}\)-valued semicircular element s with covariance \(\eta \)). Finally, choose any \(r > \Vert \operatorname {Im}(b)^{-1}\Vert \) and set
and \(q:= (1+\frac{\varepsilon }{2r})^{-1}\). Then, for each initial point \(w_0\in h_b(\mathcal {D}_r)\) and all \(n\ge 1\), with \(\mathcal {D}_r = \{w \in \mathbb {H}^-(\mathcal {B}) \mid \Vert w\Vert < r\}\) as introduced in Appendix B.1, we have
We know that \(h_b\) maps \(\mathcal {D}_r\) strictly into itself; in fact, due to (B.3), we have that \(\operatorname {dist}(h_b(\mathcal {D}_r), \mathcal {B}{\setminus } \mathcal {D}_r) \ge \varepsilon \). Further, we see that, for \(w\in \mathbb {H}^-(\mathcal {B})\),
and hence \(\Vert (D h_b)(w)\Vert \le \Vert \operatorname {Im}(b)^{-1}\Vert ^2 \Vert \eta \Vert \); thus, with \(M:= \Vert \operatorname {Im}(b)^{-1}\Vert ^2 \Vert \eta \Vert \), it holds true that \(\sup _{w\in \mathcal {D}_r} \Vert (D h_b)(w)\Vert \le M\). Finally, we note that obviously \(\operatorname {diam}(\mathcal {D}_r) = 2r\). Therefore, by applying Proposition B.4 to \(h_b: \mathcal {D}_r \rightarrow \mathcal {D}_r\), we arrive at the asserted bounds. \(\square \)
A Termination Condition for the Fixed Point Iteration
The a priori bound (B.7) for the approximation error which was obtained in Corollary B.5 may not be useful for practical purposes as the required number of iterations is simply too high; see Appendix B.5.1. Fortunately, the speed of convergence is typically much better than predicted by (B.7). In order to control how much the found approximations deviate from \(G_s\), it is thus more appropriate to use an a posteriori estimate instead. In contrast to the bound (B.7) which comes for free from Banach’s contraction theorem, Proposition B.6 which is given below exploits the special structure of our situation and provides a significantly improved version of this termination condition; we shall substantiate this claim in Appendix B.5.2. We point out that the result of Proposition B.6 has been taken up and was generalized in [23].
Before giving the precise statement, we introduce some further notation. Consider an operator-valued \(C^*\)-probability space \((\mathcal {A},\mathbb {E},\mathcal {B})\) and let \(\phi \) be a state on \(\mathcal {A}\). To every selfadjoint noncommutative random variable x which lives in \(\mathcal {A}\), we associate the function \(\Theta _x: \mathbb {H}^+(\mathcal {B}) \rightarrow \mathbb {R}^+\) which is defined by \(\Theta _x(b):= - \Vert \operatorname {Im}(b)^{-1}\Vert ^{-1} \operatorname {Im}(\phi (G_x(b)))\) for every \(b\in \mathbb {H}^+(\mathcal {B})\). Further, if \(\eta : \mathcal {B}\rightarrow \mathcal {B}\) is completely positive and \(b\in \mathbb {H}^+(\mathcal {B})\), we put \(\Delta _b(w):= b - w^{-1} - \eta (w)\) for every \(w\in \mathbb {H}^-(\mathcal {B})\).
Proposition B.6
Let \((\mathcal {A},\mathbb {E},\mathcal {B})\) be an operator-valued \(C^*\)-probability space and let s be a \(\mathcal {B}\)-valued semicircular element in \(\mathcal {A}\) with covariance \(\eta : \mathcal {B}\rightarrow \mathcal {B}\). Fix \(b\in \mathbb {H}^+(\mathcal {B})\) and let \(w_*:= G_s(b)\in \mathbb {H}^-(\mathcal {B})\) be the unique solution of the Eq. (B.1) on \(\mathbb {H}^-(\mathcal {B})\). Suppose that \(\tilde{w}_*\in \mathbb {H}^-(\mathcal {B})\) is an approximate solution of (B.1) in the sense that
The proof of Proposition B.6 requires preparation. The following lemma gives some Lipschitz bounds for operator-valued Cauchy transforms.
Lemma B.7
Let \((\mathcal {A},\mathbb {E},\mathcal {B})\) be an operator-valued \(C^*\)-probability space and consider \(x=x^*\in \mathcal {A}\). Then, for all \(b_1,b_2\in \mathbb {H}^+(\mathcal {B})\), we have that
Using the standard bound (A.2) and the fact that \(\mathbb {E}\) is a contraction, we obtain (B.12). In order to prove the bound (B.13), we proceed as follows. First of all, we apply \(\phi \) to both sides of the latter identity and involve the Cauchy-Schwarz inequality for the state \(\phi \circ \mathbb {E}\) on \(\mathcal {A}\), which yields
Next, we notice that \(\operatorname {Im}(b)^{-1} \le \Vert \operatorname {Im}(b)^{-1}\Vert \textbf{1}\) and thus \(\operatorname {Im}(b) \ge \Vert \operatorname {Im}(b)^{-1}\Vert ^{-1} \textbf{1}\) for every \(b\in \mathbb {H}^+(\mathcal {B})\), which allows us to bound
Using this as well as the bound \(\operatorname {Im}(b) \ge \Vert \operatorname {Im}(b)^{-1}\Vert ^{-1} \textbf{1}\) which was already applied in the proof of Lemma B.7, we get for every state \(\phi \) on \(\mathcal {B}\) that
This proves \(\Lambda \in \mathbb {H}^+(\mathcal {B})\); in fact, we see that \(\operatorname {Im}(\Lambda ) \ge (1-\sigma )\Vert \operatorname {Im}(b)^{-1}\Vert ^{-1} \textbf{1}\), which yields the desired bound (B.14).
which tells us that \(\tilde{w}_*\) is the unique solution of (B.9) for \(\Lambda \) instead of b.
For the \(\mathcal {B}\)-valued Cauchy transform \(G_s: \mathbb {H}^+(\mathcal {B}) \rightarrow \mathbb {H}^-(\mathcal {B})\) of s, the latter observation tells us that \(G_s(\Lambda ) = \tilde{w}_*\); recall that by definition \(G_s(b) = w_*\). With the help of the bounds (B.12), (B.14), and (B.9), we thus obtain that
as asserted in (B.11); note that \(\Theta _s(\Lambda ) \le 1\) was used in the third step. \(\square \)
When performing the iteration \((h_b^n(w_0))_{n=1}^\infty \) for fixed \(b \in \mathbb {H}^+(\mathcal {B})\) and any initial point \(w_0 \in \mathbb {H}^-(\mathcal {B})\), the results which have been collected in Corollary B.5 guarantee that \(h^n_b(w_0)\), as \(n\rightarrow \infty \), eventually comes arbitrarily close (with respect to the norm \(\Vert \cdot \Vert \) of \(\mathcal {B}\)) to the unique solution \(w_*\) of (B.1) in \(\mathbb {H}^-(\mathcal {B})\). At first sight, however, it is not clear whether this can always be detected by the termination condition (B.9) formulated in Proposition B.6. The bounds provided by the following lemma prove that this is indeed the case: the sequence \((h_b^n(w_0))_{n=1}^\infty \) converges to the (unique) fixed point \(w_*\in \mathbb {H}^-(\mathcal {B})\) of \(h_b\) if and only if \(\Vert \Delta _b(h^n_b(w_0))\Vert \rightarrow 0\) as \(n \rightarrow \infty \).
Lemma B.8
In the situation of Corollary B.5, if \(r>\Vert \operatorname {Im}(b)^{-1}\Vert \) is given, then we have for all \(w \in h_b(\mathcal {D}_r)\) that
The latter identity can be rewritten as \(w - h_b(w) = h_b(w) \Delta _b(w) w\), from which it follows that \(\Vert h_b(w) - w\Vert \le r^2 \Vert \Delta _b(w)\Vert \) for all \(w \in \mathcal {D}_r\) and thus, in particular, for \(w \in h_b(\mathcal {D}_r)\). From this, the first of the two inequalities stated in the lemma follows.
For the second one, we proceed as follows. From the bound (B.2), we get
Hence, we see that \(\Vert w^{-1}\Vert \le m_r^2 \Vert \operatorname {Im}(b)^{-1}\Vert \) for all \(w \in h_b(\mathcal {D}_r)\). In summary, we may derive from (B.15) that
for all \(w \in h_b(\mathcal {D}_r)\), which is the second inequality stated in the lemma. \(\square \)
Remark B.9
Note that \(h_b(w_2) - h_b(w_1) = h_b(w_2) \eta (w_2-w_1) h_b(w_1)\) for \(w_1,w_2 \in \mathbb {H}^-(\mathcal {B})\). Thus, for \(r>\Vert \operatorname {Im}(b)^{-1}\Vert \), it follows that
i.e., the restriction of \(h_b\) to \(\mathcal {D}_r\) is Lipschitz continuous.
Lemma B.8 ensures that Proposition B.6 can be used as an alternate termination condition in place of Corollary B.5. More precisely, in order to find an approximation \(\tilde{w}_*\) of \(w_*\) which deviates from \(w_*\) with respect to\(\Vert \cdot \Vert \) at most by \(\delta >0\) by using the sequence \((h^n_b(w_0))_{n=1}^\infty \) of iterates of \(h_b\), we may proceed as follows: compute \(w_0, h_b(w_0), \ldots , h^n_b(w_0)\) iteratively until
is satisfied; then \(\tilde{w}_*:= h^n_b(w_0) \in \mathbb {H}^-(\mathcal {B})\) is the desired approximation of \(w_*\), i.e., we have that \(\Vert \tilde{w}_*- w_*\Vert \le \delta \).
The analogous procedure based on the bound (B.8) stated in Corollary B.5 works as follows: compute \(w_0, h_b(w_0), \ldots , h^n_b(w_0)\) iteratively until
is satisfied; then \(\tilde{w}_*:= h^n_b(w_0) \in \mathbb {H}^-(\mathcal {B})\) satisfies \(\Vert \tilde{w}_*- w_*\Vert \le \delta \). This procedure, however, looses against the one based on Proposition B.6; this can be seen as follows. For \(b\in \mathbb {H}^+(\mathcal {B})\) and the given initial point \(w_0 \in \mathbb {H}^-(\mathcal {B})\), we choose \(r > \max \{\Vert \operatorname {Im}(b)^{-1}\Vert , \Vert w_0\Vert \}\). By applying Lemma B.8 to \(h^n_b(w_0)\), we get that
Therefore, if \(n\ge 2\) is such that the condition (B.17) is satisfied, then \(\Vert \Delta _b(h^n_b(w_0))\Vert \le \frac{1}{4} m_r^4 \varepsilon ^2\delta \). By definition of \(\varepsilon \), we have \(\varepsilon \le m_r^{-2} \Vert \operatorname {Im}(b)^{-1}\Vert ^{-1}\); thus, \(\Vert \Delta _b(h^n_b(w_0))\Vert \le \frac{1}{4} \Vert \operatorname {Im}(b)^{-1}\Vert ^{-2} \delta \). Suppose that \(\frac{1}{4} \Vert \operatorname {Im}(b)^{-1}\Vert ^{-1} \delta < \frac{3}{4}\); then the latter can be rewritten as \(\Vert \Delta _b(h^n_b(w_0))\Vert \le \tilde{\sigma } \Vert \operatorname {Im}(b)^{-1}\Vert ^{-1}\) for \(\tilde{\sigma }:= \frac{1}{4}\Vert \operatorname {Im}(b)^{-1}\Vert \delta \in (0,\frac{3}{4})\); thus, \(\Vert \tilde{w}_*- w_*\Vert \le \frac{\tilde{\sigma }}{1-\tilde{\sigma }} \Vert \operatorname {Im}(b)^{-1}\Vert < \delta \) by (B.10) in Proposition B.6. This means that for fixed \(b\in \mathbb {H}^+(\mathcal {B})\) and each sufficiently small \(\delta >0\), the condition (B.17) stemming from Corollary B.5 breaks off the iteration later than the condition (B.16) derived from Proposition B.6 does.
Estimating the Size of Atoms
The following corollary combines the previously obtained results, yielding a procedure by which one can compute approximations \(\theta _{\mu _s}\) for operator-valued semicircular elements s; as announced at the beginning of this section, the corollary is formulated without the restriction to centered s.
Corollary B.10
Let \((\mathcal {A},\mathbb {E},\mathcal {B})\) be an operator-valued \(C^*\)-probability space, \(\phi \) a state on \(\mathcal {B}\), and let s be a (not necessarily centered) \(\mathcal {B}\)-valued semicircular element in \(\mathcal {A}\) with covariance \(\eta : \mathcal {B}\rightarrow \mathcal {B}\). Denote by \(\mu _s\) the analytic distribution of s, seen as a noncommutative random variable in the \(C^*\)-probability space \((\mathcal {A},\phi \circ \mathbb {E})\). Then, for every \(y\in \mathbb {R}^+\) and \(\varepsilon >0\), we have that
where \(\tilde{w}_*\in \mathbb {H}^-(\mathcal {B})\) is an approximate solution of (B.1) at the point \(b=iy\textbf{1}-\mathbb {E}[s]\) in the sense that \(\Vert \Delta _b(\tilde{w}_*)\Vert < \frac{y\varepsilon }{1+\varepsilon }\) holds.
Proof
Consider \(s^\circ := s - \mathbb {E}[s]\), which is a centered \(\mathcal {B}\)-valued semicircular element with covariance \(\eta \). For every \(y\in \mathbb {R}^+\), we know that \(w_*:= G_s(b) = G_{s^\circ }(b-\mathbb {E}[s])\) is the unique solution of (B.1) at the point \(b=iy\textbf{1}-\mathbb {E}[s]\). We apply Proposition B.6 for \(\sigma := \frac{\varepsilon }{1+\varepsilon }\); note that by our assumption \(\Vert \Delta _b(\tilde{w}_*)\Vert < \sigma y\) holds. Thus, we obtain from (B.11) that
where we used that \(\Theta _{s^\circ }(b) = \Theta _s(iy\textbf{1}) = \theta _{\mu _s}(y)\). By multiplying the latter inequality with y, we arrive at the asserted bound. \(\square \)
Example
Let s be a centered \(\mathcal {B}\)-valued semicircular element with covariance \(\eta : \mathcal {B}\rightarrow \mathcal {B}\) satisfying \(\eta (\textbf{1})=\textbf{1}\). As discussed at the end of Appendix B.1, the sequence \((h_b^n(w_0))_{n=1}^\infty \) converges to the \(\mathcal {B}\)-valued Cauchy transform \(G_s(b)\) of s at the point \(b\in \mathbb {H}^+(\mathcal {B})\), for every choice of an initial point \(w_0 \in \mathbb {H}^-(\mathcal {B})\).
A Priori Bound
First, we want to compute explicitly the number of iteration steps which the a priori bound (B.7) stated in Corollary B.5 predicts in order to compute \(G_s(b)\) at the point \(b = iy \textbf{1}\) for some \(y>0\) up to an error of at most \(\delta >0\), measured with respect to the norm \(\Vert \cdot \Vert \) on \(\mathcal {B}\).
We proceed as follows. First of all, we have to choose \(r > \frac{1}{y}\); in order to maximize \(\varepsilon = \min \big \{r- \frac{1}{y}, \frac{y}{(y+r)^2}\big \}\), we let r be the unique solution of the equation \((y+r)^2(r-\frac{1}{y}) = y\) on \((\frac{1}{y},\infty )\) (which in fact is the unique solution on \(\mathbb {R}\)). Then \(\varepsilon = r - \frac{1}{y} = \frac{y}{(y + r)^2}\) and \(q = \frac{2r(y+r)^2}{2r(y+r)^2 + y}\). Next, we must choose an initial point \(w_0 \in h_b(\mathcal {D}_r)\); we take \(w_0 = h_b(- i \omega \textbf{1}) = -i \frac{1}{y+\omega } \textbf{1}\) for any \(\omega \in (0,r)\). Then \(h_b(w_0) - w_0 = -i \frac{1}{y(y+\omega )+1} \big (\omega - \frac{1}{y+\omega }\big )\textbf{1}\). Hence, by (B.7),
for all \(n\ge 1\). For the sake of clarity, let us point out that the term \(\omega - \frac{1}{y+\omega }\) vanishes precisely if \(\omega = -\frac{y}{2} + \sqrt{\frac{y^2}{4}+1}\), which yields the fixed point \(-i\omega \textbf{1}\).
In Table 1, we apply this result in a few concrete cases; the table lists the number n of iteration steps which are needed until the right hand side of the inequality (B.18) falls below the given threshold value \(\delta >0\). In each case, we compute n for two different choices of r. Besides the optimal value of r obtained as the unique real solution of the equation \((y+r)^2(r-\frac{1}{y}) = y\), we consider the explicit value \(r = \frac{1}{y} + \big (\frac{1}{\sqrt{2}} - \frac{1}{2}\big )y\). For the latter choice, we have \(r - \frac{1}{y} \ge \frac{y}{(y + r)^2}\) and hence \(\varepsilon = \frac{y}{(y + r)^2}\), so that the bound (B.18) remains true. (In fact, if we try \(r= \frac{1}{y} + q y\) for \(q>0\), then the inequality \(r - \frac{1}{y} \ge \frac{y}{(y + r)^2}\) is equivalent to \(q(1+q)\big (\sqrt{1+q} y + \frac{1}{\sqrt{1+q} y}\big )^2 \ge 1\), and the latter is satisfied in particular if \(q(1+q) = \frac{1}{4}\), which leads to \(q=\frac{1}{\sqrt{2}} - \frac{1}{2}\) as used above.)
Table 1
A priori estimates for the number n of iterations depending on the chosen r resulting from Corollary B.5 in the case of Appendix B.5.1: in each block, the upper r is chosen as the unique solution of \((y+r)^2(r-\frac{1}{y}) = y\), while the lower r is taken as the explicit value \(r = \frac{1}{y} + \big (\frac{1}{\sqrt{2}} - \frac{1}{2}\big )y\)
y
\(\delta \)
r
n
1.0
0.1
1.2055694304
68
1.2071067812
68
1.0
0.01
1.2055694304
96
1.2071067812
96
0.1
0.1
10.0009801058
494965
10.0207106781
498122
0.1
0.01
10.0009801058
541957
10.0207106781
545391
0.01
0.1
100.0000009998
9024967288
100.0020710678
9025552583
0.01
0.01
100.0000009998
9485576428
100.0020710678
9486190327
Note that \(\omega =1\) in all cases
A Posteriori Bounds
Now, we want to compare the termination conditions Corollary B.5 and Proposition B.6. Using well-known facts about continued fractions, one easily finds that
with \(q_\pm = \frac{y}{2} \pm \sqrt{\frac{y^2}{4}+1}\), \(\rho = \frac{q_-}{q_+}\), and \(\alpha = \frac{q_- + \omega }{q_+ + \omega }\); alternatively, the previously stated formula can be proven by mathematical induction. Hence
. In Table 2, we give the number of iteration steps which are needed to satisfy (B.17) and (B.16), respectively, for particular choices of y, r, and \(\omega \). These results are in accordance with the observation that the termination condition given in Corollary B.5 breaks off the iteration later than the condition given in Proposition B.6.
Table 2
Comparison of number of iterations steps needed until the termination condition (B.17) respectively (B.16) derived from the a posterior estimates stated in Corollary B.5 respectively Proposition B.6 is satisfied in the setting of Appendix B.5.1 and Appendix B.5.2
Note that \(\omega =1\) in all cases. Furthermore, the explicit value \(r = \frac{1}{y} + \bigl (\frac{1}{\sqrt{2}} - \frac{1}{2}\bigr )y\) was used in the computations, corresponding to the lower entries in each block of Table 1