Abstract
High-quality random samples of quantum states are needed for a variety of tasks in quantum information and quantum computation. Searching the high-dimensional quantum state space for a global maximum of an objective function with many local maxima or evaluating an integral over a region in the quantum state space are but two exemplary applications of many. These tasks can only be performed reliably and efficiently with Monte Carlo methods, which involve good samplings of the parameter space in accordance with the relevant target distribution. We show how the standard strategies of rejection sampling, importance sampling, and Markov-chain sampling can be adapted to this context, where the samples must obey the constraints imposed by the positivity of the statistical operator. For illustration, we generate sample points in the probability space of qubits, qutrits, and qubit pairs, both for tomographically complete and incomplete measurements. We use these samples for various purposes: establish the marginal distribution of the purity; compute the fractional volume of separable two-qubit states; and calculate the size of regions with bounded likelihood.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
Many situations in quantum information and quantum computation call for a random sample from the space of quantum states. This can be in the numerical testing of the typicality of entanglement among states from a bipartite quantum system, understanding of the efficacy of a gate implementation or a noise protection scheme by examining its performance on randomly selected states, computation of the average of some quantity of interest over a subset of quantum states, integration over a region of states, optimization of a function on the state space with a complicated landscape, etc. In every case, a quantitative conclusion can be drawn only if one first specifies what the word 'random' means, i.e., according to which distribution we are drawing states, and then have an efficient way of sampling from the state space in accordance with that distribution.
In many cases, in the absence of additional information or when desiring caution against pre-biasing the results, what one means by drawing a random sample translates into sampling from a 'uniform' distribution of states that treats every state 'fairly.' This is certainly an appropriate attitude when dealing with a discrete (sub)set of quantum states so that the uniform distribution is simply one that assigns probability to each state . For a continuous set of states, the notion of a uniform distribution, or more generally, an 'uninformative' distribution [1, 2], is an ill-defined one that depends highly on the choice of parameterization of the state space and the criterion for uniformity. One hence needs to specify the desired distribution and, depending on the choice, how one samples according to that distribution may not be easy or obvious.
One often-used choice of random distribution is defined by writing the state ρ as , with U a Haar-random unitary matrix, and Λ a diagonal nonnegative matrix with entries chosen according to the Lebesgue measure on real space. [3] and [4] describe how one samples from this distribution in a simple and computationally efficient way, even in high-dimensional problems, and employ such samples for the estimation of the proportion of entangled to separable states in bipartite state spaces. Another popular sampling approach is that introduced in [5], where the distribution of states is defined by the sampling method itself: sample from a rotationally invariant measure defined on the set of pure states in a composite system built from the original system and a duplicate copy, and then trace out the duplicate copy to arrive at a mixed state of the system, drawn from the thus-defined rotationally symmetric distribution. Later works, including those of [6] and [7] followed the same idea, but employed Monte Carlo techniques to help with the sampling.
More generally, since the desired distribution from which to sample states can vary according to the situation at hand, one needs a flexible approach to sampling from the state space. Here, we discuss general methods, adapted from statistics literature to quantum problems,6 that can be applied to arbitrary distributions. We investigate two sampling strategies: independence sampling [8] and the Markov-chain Monte Carlo (MCMC) method [8–10]. In independence sampling, one generates sample points independently and randomly according to some convenient distribution, and then uses either rejection sampling or importance sampling for approaching the target sample. This algorithm is very simple and straightforward to implement, but can become inefficient for problems where the convenient distribution from which samples are generated is too far from the target distribution. The problems of independence sampling are remedied by the MCMC method, where sample points are generated by means of a Markov-chain random walk, making use of the current sample point to decide on a clever choice for the next sample point so that one approaches the target sample efficiently.
Both sampling strategies—independence sampling and the MCMC algorithm—are useful for all situations that one may encounter, i.e., any dimension, any choice of measurement, even if one does not possess an explicit parameterization of the domain of integration (i.e., the reconstruction space; see section 2). If one has a parameterization of the integration domain, one can also make use of Hamiltonian Monte Carlo (HMC) methods to more efficiently generate a good sample; HMC methods are discussed in [11], and our companion paper [12] deals with applying them to sampling in the reconstruction space.
Samples generated by independence sampling or by MCMC are applied in various examples for illustration. We find the marginal distribution for the purity of qubit, qutrit, or qubit-pair states (figures 2(b) and 3), determine the probability that a two-qubit state of known purity is separable (figure 2(a)), and find the sizes of regions with bounded likelihood for qubit-pair data from a tomographically incomplete measurement (figure 5).
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution image2. Priors and constraints
A general measurement in quantum mechanics is a probability-operator measurement (POM).7 The POM has outcomes , which are nonnegative operators, , with unit sum, . If ρ is the true state, the probability that the kth detector clicks is given by the Born rule,
All the possible probabilities for the chosen POM constitute the probability space. Given p, it is customary to report a ρ for which (1) holds. If there is a choice among several ρs for the same p (which can happen when the POM is not informationally complete), we pick one representative [13], and so have a one-to-one mapping . These ρs constitute the reconstruction space 0. Because of the one-to-one mapping between states and probabilities, we will identify p with ρ, and regions in the probability space with corresponding regions in the reconstruction space. Note that, while the probability space is always convex, it may not be possible to choose a convex reconstruction space.8
The measurement data D consist of the observed sequence of detector clicks, with nk clicks of the kth detector after measuring a total number of copies of the state. The probability of obtaining D, if p is the true state, is the point likelihood
takes on its largest value for the maximum-likelihood estimator [14],
The positivity of ρ and its normalization to unit trace ensure that p satisfies the basic constraints and . Since the probabilities result from the POM via the Born rule, the positivity of ρ usually implies further constraints on p. For example, consider a qubit measured by a three-outcome trine POM with probabilities
where and are expectation values of Pauli operators. These trine probabilities are further constrained by . Quite generically, there are such additional constraints for the probabilities, and it may not be easy or feasible to state them explicitly for high-dimensional systems measured by many-outcome POMs. A p is called physical or permissible if it satisfies all these constraints.
We summarize all constraints in , which is a product of step functions and delta functions, and vanishes if p is not permissible. For example, for the basic constraints, we have9
as a factor in
where is the product of step functions that specify the constraints imposed by quantum mechanics. Then, the volume element of the probability space is
and we have
for the volume element of the infinitesimal vicinity of state ρ in 0, where is the (unnormalized) prior density of our choice. Specifically, we will discuss two choices for in the examples later. The first is the primitive prior,
so that the density is uniform in p over the (physical) probability space. The second is known as the Jeffreys prior [15],
which is a common choice of prior when no external prior information is available [1, 2].
Upon multiplying the prior density with the point likelihood for the observed data, we get the (unnormalized) posterior density
While sampling from the probability space in accordance with the prior is often relatively easy, sampling according to the posterior wD(p) is usually difficult. With the prior and posterior densities at hand, we can now define the size and credibility of a region in the reconstruction space. The computation of these region-specific values is an important application of random samples in the context of quantum state estimation [16]; an example is given in section 6.
The size of region is
and
is its credibility. is the prior content of region i.e., the probability that the true state is in before any data are acquired; is its posterior content, i.e., the probability that the true state is in conditioned on the data.10
If we wish to sample the reconstruction space in accordance with the prior , the fraction of sample points in region should be equal to the size of the region; and equal to the credibility when sampling according to the posterior wD(p). The situation can be reversed: we may have a procedure for generating a random sample, and then this procedure defines the underlying prior; an example is 'prior I' in section 4.
In the context of quantum state estimation, one needs to compute and for given region and data D. Since the quantum constraints are generally highly nontrivial, leading to a probability space with complicated boundaries, and the dimension of the probability space grows rapidly (as the square of the dimension for tomographically complete measurements), the integrals in and are difficult to compute directly. The structure of the integrals naturally suggests the use of Monte Carlo methods for evaluation: generate points with a density distributed according to a target w(p) (in our case, or wD(p)); the size and credibility are then the ratio of the number of points contained in to the total number of points in the full reconstruction space 0.
One also needs a systematic and efficient way of numerically checking if a given p satisfies all quantum constraints, i.e., if the given p is physical—this is required as part of the procedure for evaluating the constraint factor . For a high-dimensional system measured by a many-outcome POM, the constraints cannot easily be expressed explicitly in terms of inequalities, and can thus only be checked numerically. If the POM is informationally complete, the mapping is linear and usually known quite explicitly and one could just check if it gives a nonnegative ρ. For other POMs, this approach is often not available because the mapping is involved and only defined for physical ps, and then other methods must be used.
In appendix
3. Independence sampling: rejection sampling and importance sampling
In independence sampling, as the name suggests, sample points are randomly generated independently of one another. In general, it is not straightforward to sample directly from the target distribution —here equal to or if sampling in accordance with the prior or the posterior. However, as long as we can sample over the probability space (perhaps using a convenient parameterization) with a known reference distribution , we can approach the target distribution by means of rejection sampling or importance sampling [8]. The factor r(p) that relates the target distribution to the reference distribution,
can be regarded as the ratio ',' but this should be done with care since and usually share singularities of the delta-function kind.
The easiest way of sampling the probability space is often to first sample uniformly in p from the space of probabilities that satisfy only the basic constraints of positivity and unit sum, as specified by the factor of (5). We refer to this space as the basic probability simplex, and the (unnormalized) sampling distribution is equal to from (5), i.e., it takes a constant value over the entire simplex, and is zero for ps violating the basic constraints. In appendix
In rejection sampling, we draw many sample points according to the chosen reference distribution , and then reject (i.e., discard) or accept points in such a way that the remaining sample points are distributed according to the target distribution . More specifically, we accept a sample point with probability
where . One calls a the acceptance ratio.
Rejection sampling requires one to discard points in accordance with the acceptance ratio, and one ends up with fewer sample points than the initial set drawn from . In importance sampling, instead of discarding points, one attaches a weight to each point to compensate for the difference between the sampling and the target distributions. For sample point , the weight is
This weight can be thought of as a multiplicity for each sample point in accordance with the target , so that each point counts Wj times in computing the ratio of number of points in to the total number of points in 0 for the value of the integral or . Moreover, the weights should have finite variance for good practical performance and ideally be bounded [17]. But this can be hard to check.
For the reference distribution that is uniform on the simplex, , we have in (15) and (16) with for prior sampling or for posterior sampling. Both in rejection sampling and in importance sampling, unphysical points, i.e., those that satisfy the basic but not the quantum constraints, do not contribute to the integral, since they are either rejected with unit probability in rejection sampling, or carry zero weight in importance sampling. This means that, if 0 is a small subregion of the basic probability simplex, one ends up with only a small fraction of the sample points contributing finally to the integral. For example, for a three-outcome trine measurement on the single qubit of (4), only of the points sampled from are physical (see figure 1). The yield decreases as the dimensionality of the system increases: for a nine-outcome trine-antitrine (TAT) measurement on a qubit-pair (see section 6), only about 10% of the points are physical [13]; and for the sixteen-outcome POM used in section 4, only one in candidate points is accepted.
One also runs into problems where the ratio r(p) is sharply peaked. For example, the Jeffreys prior formally becomes infinite when (at least) one of the pks is zero. In practice, one never gets a sample point with a pk that is exactly zero, so that is never infinite. Still, any sample point in the vicinity of the singular points will have a very large value. The normalization constant R for the Jeffreys prior is also formally infinite, calling to question the applicability of the rejection sampling procedure. In practice, one can take R as a large constant by approximating the target Jeffreys prior by one with a 'cutoff' value when one or more of the pks vanish. This still, however, makes the acceptance rate tiny for all ps away from the singular points. Correspondingly, in importance sampling, large weights are attached to the points in the vicinity of these singular points, and the main contribution to the integral then comes from just those few points.
Both the problems of small physical subregion and sharply peaked priors stem from the fact that the target distribution can be very different from the reference distribution. Whenever possible, one should start with samples from a that is close to . Nevertheless, independence sampling according to a uniform on the basic probability simplex is straightforward to set up, and can provide an easy first estimate of the desired integral, or more generally, a rough first sample.
4. Example: volume of separable two-qubit states
For a first application, we sample the two-qubit state space and ask how large the volume of the set of separable states (or conversely, entangled states) is. In [3], a natural prior on the set of states is used that is induced by the Haar measure on the group of unitary matrices and the Lebesgue measure on the real space (labeled 'Prior I' in figures 2 and 3). For this prior, numerical results establish that of the mixed two-qubit states are separable.
Here, we consider the scenario where each of the two qubits is measured by the four-outcome tetrahedron POM of [18] separately. The resulting two-qubit POM (which is informationally complete) has sixteen outcomes with the single constraint of unit sum, so the probability space is fifteen-dimensional. We employ the simple rejection sampling method to generate probabilities in accordance with the primitive prior (labeled 'Prior II' in figures 2 and 3). Altogether physical probabilities are generated, with an acceptance rate of .11 This small acceptance rate is due to the tiny fraction of the physical region compared to the whole probability simplex, which is fifteen-dimensional. Then we construct the corresponding density matrices as well as their partial transposes. The well-known Peres–Horodecki criterion states that if a state ρ is separable, then its partial transpose has nonnegative eigenvalues; otherwise ρ is entangled. According to our numerical results, the probability that a randomly generated two-qubit state is separable equals , which is much smaller than the value reported in [3]. Clearly, the two priors are quite different.
To better understand how these priors differ, and how this difference affects the computation of the volume of separable states, it is worth considering the physical connection between the purity of the states and their separability. In figure 2(a), we show the probability of finding a separable two-qubit state as a function of the purity for both priors. Although the two curves are very similar, there appears to be a systematic difference between the two priors: for given purity, states are a bit more likely to be separable for prior I than prior II. The strong similarity, however, tells us that the prior densities conditioned on the purity are close. But the marginal densities for the purity are rather different; see figure 2(b). For our prior II, we find the marginal density peaking at a higher purity value, indicating that this prior puts more weight on the states of higher purity and less weight on the low purity states. This, together with the fact that higher purity states are less likely to be separable, results in a smaller overall probability for our prior to produce a separable state.
To further see the difference between these two approaches, we also plot the probability density of the quantum states for qubit as well as qutrit state space in figure 3. Analytical forms of the marginal densities are indicated in the plots by red curves; see appendix
5. MCMC sampling
The problems of independence sampling—the low yield in high-dimensional spaces and the difficulty in handling sharply peaked distributions reliably—can be resolved by using the MCMC strategy (see, for instance, [19]). In MCMC, sample points that obey the basic constraints are generated sequentially, with the position of the next point depending on the position of the current point; hence the term 'Markov chain.' One makes use of a random walk such that the next sample point is likely to be in the vicinity of the current point; this gives a high chance of staying within the permissible region if the current point is physical, or within the same peak if the current point is within a sharply peaked region of
with or .
The Markov chain's stationary distribution is to be the target distribution. To achieve this, the Metropolis–Hastings Monte Carlo (MHMC) procedure [9, 10] is adopted when performing the random walk; see appendix
The reparameterization in terms of x requires a corresponding transformation of the target distribution,
Also, the starting point of the random walk should be a physical probability, which can be ensured by picking a state ρ (the maximally mixed state, for instance), computing the corresponding probabilities , and setting the initial .
Putting it all together, the x-parameterized MHMC scheme is as follows:
xMHMC1 Pick an arbitrary state. Obtain from the Born rule, then set and j=1.
xMHMC2 Randomly generate , a K-dimensional variable with mean 0 and variance .
xMHMC3 Compute and .
xMHMC4 Compute the acceptance ratio
xMHMC5 Draw a random number b uniformly from the range . If , set ; otherwise, set .
xMHMC6 Update . For target number of sample points M, escape the loop if j=M; otherwise, return to xMHMC3.
Some attention should be paid to the choice of variance in xMHMC2. The corresponding standard deviation σ can be viewed as the typical 'step size' in the random walk. Generally, if the step size is too large, the acceptance rate tends to be low, since a single step may take one too far from the permissible or important region; if the step size is too small, the random walk takes a long time to explore the entire space. Therefore, σ has to be chosen carefully. In statistics literature, using general arguments invoking the central limit theorem in the many-parameter situation, one is told that the optimal σ should be chosen such that acceptance rate is around 23% (see, for instance, [20] and [21]). This gives a good rule of thumb for choosing the step size σ, and turns out to be quite reasonable even for small-dimensional problems; see figure 4.
6. Example: incomplete two-qubit tomography
For a comparison of the differences between various sampling methods, we consider the TAT scheme of [22] (see also section 6 in [13]) for quantum key distribution. This TAT scheme can be implemented as follows: begin with a source of entangled qubit pairs and distribute one qubit each to the two communicating particles; one qubit is measured by the trine POM of (4); the other is measured by the antitrine POM, which is (4) with the signs of x and y reversed. The two-qubit POM has nine outcomes subject to the single constraint of unit sum, resulting in an eight-dimensional probability space. For the simulated experiments, we measure 60 copies of qubit pairs, and the data, in one experiment, are , which are used in the specification of the optimal error regions below.
We generated various sets of samples in accordance with the primitive prior as well as with the Jeffreys prior. The platform used for generating these samples is a standard desktop (Intel i7–3770 CPU, with quad core and 8 GB RAM). The CPU time taken to generate the sample of (physical) probabilities with different sampling strategies is summarized in table 1. We also show the time taken by MCMC, but with a physicality check that exploits the structure of the TAT (labeled 'MCMC with parameter searching' in table 1). In the TAT version of the matrix in (47) of [12], eight out of nine real parameters for a reconstruction space can be determined by the probabilities, with the last one being in the range . If such a ninth parameter can be found to make the corresponding state positive semidefinite, then the generated p is physical; otherwise it is not. The CPU time by this procedure is almost the same as that by MCMC with CG in the physicality check. Moreover, there is barely any difference in time whether the sample is generated according to the primitive prior or the Jeffreys prior, as the time-consuming part is the checking of physicality of the generated probabilities.
Table 1.
CPU time taken to generate the sample of (physical) probabilities with different sampling strategies for qubit pairs measured by the TAT POM in accordance with the primitive prior. DG: direct gradient; CG: conjugate gradient; see appendix
Sampling scheme | CPU time |
---|---|
Independence with DG | 14 hr |
MCMC with DG | 1 hr 20 min |
MCMC with CG | 11 min |
MCMC with parameter searching | 13 min |
In figure 5, we show the size as a function of λ for different regions for data D using samples of various sizes. Here, λ is the likelihood threshold for the region, with . The region specified by a given λ is the set of points with point likelihoods satisfying . For figure 5(a) using the samples generated by independence sampling, there is not much difference for the curves obtained with the samples of sizes and for both the primitive prior and the Jeffreys prior. However, for figure 5(b) using the samples generated by MCMC, the curve obtained with sample points deviates quite far from the curves obtained with larger sample sizes for the primitive prior. A possible reason is that the chain may have been 'trapped' at the mode of the prior in the transformed x space for a significant portion of the time, and the sample with points is not large enough for the random walk to reach the whole space. This does not happen for the Jeffreys prior, which is flat in the x space.
In terms of the 'quality' of the sample points, we notice that a sample of size generated in accordance with the primitive prior by independence sampling is good enough to produce figure 5; while it requires a sample of size if we use MCMC. In this sense, the comparison of the time taken to generate the same number of physical probabilities in table 1 may not always be fair. At least for our particular purpose of calculating the sizes, both the independence sampling and the MCMC are roughly equivalent. The lower acceptance rate of independence sampling is supplemented by a faster convergence rate, while MCMC converges more slowly at the benefit of a higher acceptance rate.
7. Conclusion
We have shown how one can perform rejection sampling, importance sampling, and MCMC sampling in the probability space (and thus also in the reconstruction space) with due attention paid to all the constraints obeyed by physical probabilities. Rejection sampling and importance sampling are rather simple to implement but they have a low yield and are costly (in CPU time) unless one manages to check the physicality of candidate probabilities in an efficient way. While MCMC sampling tends to be less costly because the yield is higher (fewer candidate probabilities rejected), this comes at the price of correlations in the sample, which in turn requires larger samples to achieve the same accuracy that rejection sampling and importance sampling get for smaller samples. For comparison, we have generated samples of various sizes in the probability space for two-qubit states measured by an incomplete POM. Using the samples created, the sizes for different regions are then calculated.
Once the samples are at hand, one can now efficiently compute the optimal error regions for quantum state estimation introduced in [13], where integrals over high-dimensional regions in the quantum state space must be evaluated. While this application motivated these investigations, the random samples themselves can be used for the many purposes mentioned in the introduction. The algorithms explored here—independence sampling and MCMC sampling—can be applied to problems of any dimension, any choice of measurement, and any target distribution, even if one does not have an explicit parameterization of the state space, as is often the case. This flexibility is of great practical value.
If an explicit parameterization of the state space is available, one can alternatively sample by the so-called HMC algorithm [11]. HMC is very different from both independence sampling and MCMC in many aspects. We deal with HMC in our companion paper [12].
Acknowledgments
We thank Michael Evans and Yong Siah Teo for stimulating discussions. This work is funded by the Singapore Ministry of Education (partly through the Academic Research Fund Tier 3 MOE2012-T3–1-009) and the National Research Foundation of Singapore.
Appendix A.: Determining the physicality of p
Inspired by maximum-likelihood considerations, we choose a figure-of-merit functional , where p is the candidate probability to be checked and is a physical probability, such that has a global maximum in but no local maxima. The idea is that p is physical if is equal to its largest possible value ; otherwise p is not physical. Examples are or , where the equal signs hold only if , and implies that p is not physical.
The physicality of is ensured by the Born rule , where
is assuredly nonnegative and has unit trace. (Whether these are in the reconstruction space or not is irrelevant here.) Infinitesimal variations of originate in variations of which in turn result from varying and A. The response of to such infinitesimal changes is
where
and its adjoint are the components of the gradient. As in [23], steepest ascent ('follow the gradient uphill') is achieved by putting and with , when
The equal sign holds only when the gradient vanishes, or
here and elsewhere, the in R and other expectation values always refer to the under consideration.
Thus, we update in accordance with
or, in discrete form,
Numerically, the value of is reached when , where is the trace norm for operator M and is a pre-chosen precision. Upon comparing this value for with , we conclude that p is physical or not.
This search for the maximum of Q can be carried out by simply following the direct gradient (DG) G of (A.2) and (A.3). Or, we can use the more efficient conjugate-gradient (CG) method [24], with the CG H defined recursively,
where is the direct gradient and . The parameter is known as the Polak–Ribière criterion determined by G, see (A.10). Here is an outline of the iterative algorithm that employs the CG method for the physicality check (PC) of p, for a d-dimensional quantum system:
PCCG1 Start with j = 1, two fixed constants and ξ, the d×d identity matrix , and the maximally mixed state .
PCCG2 Compute R1 of (A.3) and set , .
PCCG3 Escape the loop if ; otherwise, proceed with PCCG4–8.
PCCG4 Set and compute
PCCG5 Compute according to (A.3) and set .
PCCG6 Compute
PCCG7 Set .
PCCG8 Update and repeat the iteration from PCCG3.
Instead of using a constant in PCCG4, we employ a suitable line-optimization procedure to speed up the algorithm [24]. Such a line search can in principle expedite the optimization, but may become impractical in higher dimensions as the evaluation of many large matrices is computationally very expensive. In this case, a fixed value of is used instead. As for the parameter χ used in PCCG6, it should be chosen from the range , with smaller values typically leading to quicker convergence. Compared with the direct-gradient method, each iteration of the CG method is slightly more expensive computationally. However, the number of iterations required for convergence is much smaller for the CG method. The overall effect is that the CPU time required by the CG method is a fraction of that required by the direct-gradient method.
Appendix B.: Sampling uniformly over the basic probability simplex
We describe two algorithms for sampling uniformly over the basic probability simplex, i.e., p satisfying and .
The first algorithm employs the idea that K exponential random variables, when normalized, reduce to a Dirichlet distribution [25],
where the α parameters are positive. Together with the fact that a uniform sample over the basic probability simplex is a Dirichlet distribution with for all k, we can sample uniformly over the simplex as follows:
Bas1 Generate K random numbers uniformly over the interval .
Bas2 Set . Each yk obtained this way is drawn from an exponential distribution, i.e., , with the rate parameter .
Bas3 Set .
The second algorithm, analyzed in [26], samples as follows:
Bas'1 Start with and .
Bas'2 Draw random numbers uniformly over the interval , and sort the list from smallest to largest: .
Bas'3 Set .
For both algorithms, is uniformly distributed over the simplex.
Appendix C.: Probability density for the purity
Prior I of section 4 and figures 2 and 3 is defined by the sampling method employed by Życzkowski et al [3]. They choose the eigenvalues of the statistical operator ρ uniformly from the simplex with and and represent the eigenkets of ρ by the eigencolumns of a Haar-random unitary matrix.
For a statistical operator from this sample, the purity is the sum of the squared eigenvalues,
We write for the probability that the purity is between ξ and and then have
for . Explicit expressions for the purity density are
for the qubit case of figure 3(a);
for the qutrit case of figure 3(b); and
for the qubit-pair case of figure 2(b).
The prior density of prior II is proportional to of (8) and (7) with . In the case of the qubit and the tetrahedron POM of [18], the four probabilities are related to the expectation values of the Pauli operators by equations (25) of [12], and (26) there states the factor for the quantum constraint in (6). The purity is
in terms of the tetrahedron probabilities. Accordingly, we get the purity density
see figure 3(a). Likewise, we find that
for the qutrit case of figure 3(b) and the qubit-pair case of figure 2(b), but we do not have analytical expressions for larger values of ξ because the quantum constraints introduce complications that are difficult to cope with.
Appendix D.: MHMC algorithm
For a random walk over a parameter space with elements θ and a target distribution , the MHMC algorithm is the following:
MHMC1 Choose a proposal-generating density, say , which describes the probability density of proposing point given the current point θ.
MHMC2 Choose a starting point , and set j = 1.
MHMC3 Randomly draw a candidate from the density .
MHMC4 Compute the acceptance ratio
MHMC5 Draw a random number b uniformly from the range . If , set ; otherwise, set . This implements the criterion of accepting the new proposal with probability a.
MHMC6 Update . Escape the loop when j=M, the target number of sample points; otherwise, return to MHMC3.
The proposal-generating density J determines where we move to in the next step of the Markov chain, and it is convenient to choose one that is symmetric, i.e., , for all θ and . This symmetry simplifies the expression for the acceptance ratio, and furthermore, relieves us of the need to know the specific form of J, apart from enforcing the symmetry. A common symmetric choice for is the (multivariate) Gaussian distribution, with mean θ and a constant covariance matrix.
Footnotes
- 6
Ironically, the statisticians had earlier learned the methods from physicists.
- 7
In literature, POM is often written as POVM, standing for 'positive operator-valued measure'. This reference to the mathematical discipline of measure theory arises for historical reasons. We prefer the physical term POM as a more descriptive and relevant here.
- 8
For example, if the probabilities determine all nondiagonal elements of the 3 × 3 matrix of a qutrit state, it is not possible to assign diagonal matrix elements such that the reconstruction space is convex.
- 9
- 10
- 11
The same random sample according to the primitive prior can also be obtained as follows [5]: generate a square random matrix A with all entries being independent complex Gaussian numbers, and compute , which is automatically physical. This procedure is much faster, but only applies to informationally complete POMs with the primitive prior.