Skip to main content
Top
Published in: Quantum Information Processing 2/2024

Open Access 01-02-2024

Estimating quantum mutual information through a quantum neural network

Authors: Myeongjin Shin, Junseo Lee, Kabgyun Jeong

Published in: Quantum Information Processing | Issue 2/2024

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

We propose a method of quantum machine learning called quantum mutual information neural estimation (QMINE) for estimating von Neumann entropy and quantum mutual information, which are fundamental properties in quantum information theory. The QMINE proposed here basically utilizes a technique of quantum neural networks (QNNs), to minimize a loss function that determines the von Neumann entropy, and thus quantum mutual information, which is believed more powerful to process quantum datasets than conventional neural networks due to quantum superposition and entanglement. To create a precise loss function, we propose a quantum Donsker-Varadhan representation (QDVR), which is a quantum analog of the classical Donsker-Varadhan representation. By exploiting a parameter shift rule on parameterized quantum circuits, we can efficiently implement and optimize the QNN and estimate the quantum entropies using the QMINE technique. Furthermore, numerical observations support our predictions of QDVR and demonstrate the good performance of QMINE.
Notes

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

The concept of quantum mutual information (QMI) in quantum information theory quantifies the amount of information shared between two quantum systems. This extends the classical notion of mutual information to the quantum regime [13]. This information measure is fundamental, because it determines the quantum correlation or entanglement between two quantum systems. The information obtained from quantum mutual information can be applied to various fields of quantum information processing such as quantum computation, quantum cryptography, and quantum communication [2, 3] (particularly in quantum channel capacity problems [4, 5]). They also play a crucial role in quantum machine learning [6, 7], where they measure the information shared between different representations of quantum datasets. Moreover, the gathered information can be used to enhance the efficiency and effectiveness of quantum algorithms in processing quantum data.
Quantum mutual information is expressed as the sum of von Neumann entropies, denoted by \(S(\rho )=-{\text {Tr}}(\rho \ln \rho )\) for a quantum state \(\rho \), making the determination of the von Neumann entropy [8] essential for calculating quantum mutual information. In recent years, the estimation of the von Neumann entropy has garnered significant attention in the field of quantum information theory. Various methods have been proposed to estimate von Neumann entropy, including those exploiting quantum state tomography [9], Monte Carlo sampling [10], and entanglement entropy [1118]. Several studies [1618] have utilized quantum query models for entropy estimation and have demonstrated promising quantum speedups. Specifically, Wang et al. [16] proposed that the von Neumann entropy can be estimated with an accuracy of \(\varepsilon \) by using \(O(\frac{r^2}{\varepsilon ^2})\) queries. However, these query model-based algorithms have practical limitations because a quantum circuit that generates the quantum state must be prepared, and the effectiveness of constructing a quantum query model for the input state remains an open question [15]. Thus, we focused on estimating the von Neumann entropy of an unknown quantum state using only identical copies of the state. To the best of our knowledge, no existing quantum algorithms estimate the von Neumann entropy using \(O(\text {poly}(r), \text {poly}(\frac{1}{\varepsilon }))\) copies of the quantum state, where r represents the rank of the state.
A mutual information neural estimation (MINE) method is a novel technique that utilizes neural networks to calculate the classical mutual information between two random variables. More precisely, this method optimizes a neural network to estimate mutual information by minimizing the loss function. The loss function is based on the Donsker-Varadhan representation [19] that provides a lower bound for the well-known Kullback–Leibler (KL) divergence.
Quantum neural networks (QNNs) [20, 21], which are among the most powerful quantum machine learning methods, serve as quantum counterparts to conventional neural networks, and offer several advantages. One notable advantage is the ability to use a quantum state as an input, which is particularly advantageous when calculating quantum mutual information or the von Neumann entropy. We identified two types of QNNs in the literature [20, 22] that possess a neural network structure and leverage quantum advantages accompanied by well-defined training procedures. In this study, we employed a parameterized quantum circuit [22], which is known for its quantum advantages, despite the presence of the barren plateau problem, which requires further investigation [23].
As a quantum analog of MINE, we propose a quantum mutual information neural estimation (QMINE), which is a method for determining the von Neumann entropy and quantum mutual information through a quantum neural network technique. Similar to the classical case, QMINE uses a quantum neural network to minimize the loss function that evaluates the von Neumann entropy. To generate a loss function that estimates the von Neumann entropy, we present the quantum Donsker-Varadhan representation (QDVR), which is a quantum version of the Donsker-Varadhan representation. QMINE offers the potential for a quantum advantage in estimating the von Neumann entropy facilitated by QDVR. By converting the problem of von Neumann entropy estimation into a quantum machine learning regime, QMINE opens new possibilities. There is also the potential to estimate von Neumann entropy using only \(O(\text {poly}(r), \text {poly}(\frac{1}{\varepsilon }))\) copies of the quantum state. However, we acknowledge that further investigation is required owing to the challenging and well-known barren plateau problem, as well as the need for efficient quantum training methods.
The remainder of this paper is organized as follows. In Sect. 2, we briefly introduce the basic notions of quantum mutual information, MINE, and parame- trized quantum circuits. In Sect. 3, we generalize the Donsker-Varadhan representation to a QDVR, which is the main component of QMINE. We also propose an estimation method for von Neumann entropy using quantum neural networks in Sect. 4. This implies that it is possible to efficiently obtain the quantum mutual information, and its numerical simulations under the framework of QMINE in Sect. 5. Finally, a discussion and remarks are presented in Sect. 6, and open questions and possibilities are raised for future research.
Note on concurrent work. The independent and concurrent work [24] appeared on the arXiv a few days after our preprint was uploaded. It introduced a method for estimating von Neumann entropy reminiscent of ours, with Rényi entropy, measured relative (Rényi) entropy, and fidelity. Our work focused on estimating von Neumann entropy with low copy complexity. We reduced the domain in the variation formula but Ref. [24] did not. We believe that limiting the trace and rank in the variation formula is crucial for effective estimation.

2 Preliminaries

2.1 Quantum mutual information and von Neumann entropy

Quantum mutual information, also known as von Neumann mutual information, quantifies the relationship between two quantum states. This can be calculated by using the formula (See Fig. 1):
$$\begin{aligned} I\left( A:B\right) = S\left( \rho ^A\right) + S\left( \rho ^B\right) - S\left( \rho ^{AB}\right) . \end{aligned}$$
(1)
Here, \(S(\rho )\) represents the von Neumann entropy [8] of quantum state \(\rho \) in a d-dimensional Hilbert space, given by \(S\left( \rho \right) = -\text {Tr}\left( \rho \log \rho \right) \). Therefore, estimating the von Neumann entropy enables the estimation of the quantum mutual information.
The von Neumann entropy, which is an extension of the Shannon entropy [25] to the quantum domain, can be estimated using quantum circuits and measurements. It is defined as the entropy of the density matrix associated with a quantum state, where the density matrix is a positive semi-definite matrix that represents the state. To estimate the von Neumann entropy, measurements can be performed on multiple copies of the quantum state and the outcomes of these measurements can be utilized. The most straightforward approach is to directly estimate the density matrix and calculate the entropy using its definition. However, estimating the von Neumann entropy can be challenging, particularly for large quantum systems, owing to the difficulty in accurately estimating the density matrix. Furthermore, the estimation accuracy is influenced by the number of measurements conducted and the quality of the quantum hardware employed. However, ongoing research is focused on developing more efficient and precise methods for estimating the von Neumann entropy.
Several methods have been employed to estimate von Neumann entropy, particularly those utilizing the quantum query model [15, 17, 18]. In the quantum query model, if the quantum circuit U produces a quantum state \(\rho \), it utilizes unitary gates such as U, \(U^{\dagger }\), and CU (controlled-U). However, the quantum circuit must be known to use the query model. The effectiveness of constructing a quantum query model for a given input state remains uncertain [15], prompting us to explore the von Neumann entropy estimation without relying on the quantum query model. In the absence of a query model, our approach solely exploits identical copies of quantum states. Previous studies, such as Acharya et al. [13] employed \(O(d^2)\) copies of the quantum state \(\rho \), where d denotes the dimension, whereas Wang et al. [15] used \(O\left( \frac{1}{\varepsilon ^{5}\lambda ^{2}}\right) \) copies of \(\rho \), where \(\lambda \) represents the lower bound on all nonzero eigenvalues. To the best of our knowledge, no existing algorithm provides a high-accuracy estimation of the von Neumann entropy by using only \(O(\text {poly}(r))\) copies of \(\rho \) with rank r.

2.2 Mutual information neural estimator

The mutual information neural estimator (MINE) [26] is a method for estimating the mutual information of two random variables by using neural networks. This approach involves selecting functions \(T_{\theta }: X \times Y \rightarrow \mathbb {R}\) that are parameterized by neural networks with the parameter \(\theta \in \varTheta \). Considering n samples, we define the empirical joint and product probability distributions as \(p_{XY}^{(n)}\) and \(p_X^{(n)} \times p_Y^{(n)}\), respectively. The MINE strategy is given by:
$$\begin{aligned} \widehat{I(X:Y)_n} = \sup _{\theta \in \varTheta } \mathbb {E}_{p_{XY}}\left[ T_{\theta }\right] -\log \left( \mathbb {E}_{p_X \times p_Y}\left[ e^{T_{\theta }}\right] \right) , \end{aligned}$$
(2)
where \({\mathbb {E}}\) is the expected value. Additionally, the Donsker-Varadhan representation is defined as follows: For any probability distribution functions p and q,
$$\begin{aligned} D_{KL}(p||q) = \sup _{T: \varOmega \rightarrow \mathbb {R}} \mathbb {E}_p[T]-\log \left( \mathbb {E}_q\left[ e^T\right] \right) , \end{aligned}$$
(3)
where we take the supremum over all the functions T.
Using the Donsker-Varadhan representation [27], it can be proven that \(I\left( X:Y\right) \ge \widehat{I\left( X:Y\right) _n}\) and MINE are strongly consistent, meaning that there exists a positive integer N and a choice of neural network parameters \(\theta \in \varTheta \) such that for all \(n \ge N\), \(\left| I(X:Y) - \widehat{I(X:Y)_n}\right| \le \varepsilon \). By applying a gradient descent method on the neural network \(T_{\theta }\) to maximize \(\mathbb {E}_{p_{XY}}[T_{\theta }]-\log \left( \mathbb {E}_{p_X \times p_Y}\left[ e^{T_{\theta }}\right] \right) \), we can obtain \(\widehat{I(X:Y)_n}\) and estimate the mutual information I(X : Y).
The MINE technique has found applications in various areas of artificial intelligence, such as feature selection, representation learning, and unsupervised learning, using information-theoretic methods. Compared to previous approaches, it provides more accurate and robust estimates of mutual information, leading to significant advancements in the field of artificial intelligence (AI). It is important to recognize that MINE is a relatively new and rapidly evolving field, with ongoing research focused on enhancing and broadening its capabilities. Nonetheless, the MINE technique is widely regarded as a valuable tool in AI and information theory, offering a powerful and flexible approach for estimating the mutual information between variables.

2.3 Parametrized quantum circuits

Parameterized quantum circuits (PQCs) [22] are quantum circuits that incorporate adjustable parameters, typically represented as real numbers. These parameters can be fine-tuned to control the behavior of the quantum circuit, thereby providing increased flexibility and optimization potential. Parameterized quantum circuits have extensive applications in quantum machine learning and optimization algorithms, enabling computations that are challenging or even infeasible using classical methods.
By manipulating circuit parameters, one can efficiently learn and represent quantum systems in a compact and adaptable manner. In quantum optimization, parameterized quantum circuits play a crucial role in global minimum search. The objective function can be represented as a measurement outcome of the quantum circuit. The quantum circuit can harness superposition and entanglement to explore the search space more effectively than classical optimization algorithms.
One of the core techniques used in quantum optimization procedures for parameterized quantum circuits is the parameter shift rule [28]. The parameter shift rule is a powerful tool in quantum machine learning that enables efficient computation of gradients with respect to the parameters of a quantum circuit.
The fundamental concept behind the parameter shift rule is to employ a quantum circuit with adjustable parameters to perform the measurements. By utilizing the measurement outcome, it is possible to estimate the gradient of a cost function with respect to the circuit parameters. This rule capitalizes on the notion that small variations in the parameters of a quantum circuit can be used to calculate the derivative of the cost function pertaining to these parameters.
The underlying principle involves the preparation of two identical copies of a quantum state, each with slightly different parameter values. By comparing these two quantum states, it was possible to estimate the gradient. By using multiple samples via measurement, the gradient can be estimated.
If a parameterized quantum circuit is represented as a sequence of unitary gates, it is denoted as
$$\begin{aligned} U(x;\theta ):= \left( \prod _{i=1}^{N} U_i(\theta _i)\right) U_0(x). \end{aligned}$$
The output of the circuit can then be observed using an observable \(\hat{O}\) and the measurement outcome becomes a quantum circuit function. The quantum circuit function is expressed in simplified form as \(f(x;\theta _i) = \langle \psi _i| U_i^\dagger (\theta _i) \hat{O}_{i+1} U_i(\theta _i) |\psi _i\rangle \) for each i. The gradient of the quantum circuit function can then be calculated using the parameter shift rule, as follows:
$$\begin{aligned} \nabla _{\theta _i} f(x;\theta _i)= & {} c \left( \langle \psi _i| U_i^\dagger (\theta _i+s) \hat{O}_{i+1} U_i(\theta _i+s) |\psi _i\rangle \right. \nonumber \\{} & {} \left. - \langle \psi _i| U_i^\dagger (\theta _i-s) \hat{O}_{i+1} U_i(\theta _i-s) |\psi _i\rangle \right) . \end{aligned}$$
(4)
The parameter shift rule has been successfully employed in various quantum machine learning algorithms, including quantum neural networks [20, 22] and quantum support vector machines [29, 30], for optimization and training purposes. It is regarded as a valuable tool for developing efficient quantum machine learning algorithms, as it enables the efficient computation of gradients in quantum systems, which is often a challenging task. It is important to note that the parameter shift rule is an approximation, and its accuracy depends on factors such as the choice of parameters, cost function, and the specific quantum circuit. Nevertheless, it has proven to be a useful and efficient technique in the emerging field of quantum machine learning, and our ongoing research focuses on enhancing and expanding its potential capabilities.

3 Quantum Donsker-Varadhan representation

The quantum Donsker-Varadhan representation is a mathematical framework that enables quantum neural networks to estimate the von Neumann entropy. It is a quantum counterpart of the original Donsker-Varadhan representation, with the distinction that QDVR focuses solely on the quantum entropy rather than on the relative entropy. QDVR can be considered as a modified version of the Gibbs variational principle [31], which restricts the domain to density matrices.
As mentioned previously, MINE [26] exploits the original Donsker-Varadhan representation to estimate classical mutual information using a (classical) neural network. In the context of estimating mutual quantum information, it is natural to consider a quantum version of the Donsker-Varadhan representation. Notably, we need only estimate the components of von Neumann entropy \(S(\rho _A)\), \(S(\rho _B)\), and \(S(\rho _{AB})\) to determine the quantum mutual information I(A : B). A variational formula for von Neumann entropy exists as follows:
Theorem 1
(Gibbs Variational Principle [31]) Let \(f: H^{d \times d} \rightarrow \mathbb {R}\) be a function defined on d-dimensional Hermitian matrices T and \(\rho \) be a density matrix. Then we have
$$\begin{aligned} f(T) = -{\text {Tr}}(\rho T) + \log \left( {\text {Tr}}(e^T)\right) . \end{aligned}$$
(5)
Thus, for d-dimensional Hermitian matrices T, the von Neumann entropy is given by:
$$\begin{aligned} S(\rho ) = \inf _{T} f(T), \end{aligned}$$
(6)
where the infimum is taken over all Hermitian T.
Our objective is to determine the Hermitian matrix T that maximizes f(T). We parameterize T by using \(t_i \in \mathbb {R}\) and \(\left| \psi _i\right\rangle \in \mathbb {C}^d\). We can express \(T = \sum _{i=1}^r t_i \left| \psi _i\right\rangle \left\langle \psi _i\right| \), which gives us \(f(T) = -\sum _{i=1}^d t_i \left\langle \psi _i\right| \rho \left| \psi _i\right\rangle + \log \left( \sum _{i=1}^d e^{t_i}\right) \). To compute f(T), we must measure the quantum state \(\rho \) using the basis \(\{\left| \psi _i\right\rangle \}_{i=1}^d\). Achieving this with an error smaller than \(\varepsilon \) requires \(O(\frac{\sigma ^2}{\varepsilon ^{2}})\) samples of \(\rho \), where \(\sigma :=\text {Var}(\{t_i\})\). However, the number of required samples of \(\rho \) can become substantial because of the broad domain of T, which encompasses all Hermitian matrices. Therefore, reducing the size of this domain is imperative.
Lemma 1
For all Hermitian matrices T, the function f holds that
$$\begin{aligned} f(T) = f(T+cI) \end{aligned}$$
(7)
for a constant c.
Proof
For any \(T \in H^{d \times d}\), we have
$$\begin{aligned} f(T+cI)&= -\text {Tr}(\rho (T+cI)) + \log \left( \text {Tr}(e^{T+cI})\right) \\&= -\text {Tr}(\rho T) - c\text {Tr}(\rho ) + \log \left( e^c\text {Tr}(e^T)\right) \\&= -\text {Tr}(\rho T) - \log \left( \text {Tr}(e^T)\right) \\&= f(T). \end{aligned}$$
Thus, \(f(T) = f(T+cI)\) for a constant c. \(\square \)
Proposition 1
(Domain Reduction) Let \(f: H^{d \times d} \rightarrow \mathbb {R}\) be a function defined on d-dimensional Hermitian matrices and let \(\rho \) be a density matrix. Then,
$$\begin{aligned} S(\rho ) = \inf _Tf(T) \end{aligned}$$
(8)
for d-dimensional ‘positive’ Hermitian matrices T.
Proof
For any Hermitian matrix \(T \in H^{d \times d}\), let \(c = \max _{\left| \psi _i\right\rangle \in \mathbb {C}^d}(-\left\langle \psi _i\right| T\left| \psi _i\right\rangle )\). From Lemma 1, there exists a positive Hermitian matrix \(T_0 = T+cI\) such that \(f(T)=f(T_0)\). Therefore, we can reduce the domain of T to a positive Hermitian matrix. \(\square \)
Now, we only need to search for the space of the positive Hermitian matrices to find the optimal T. The computational complexity of copying \(\rho \) to calculate f(T) depends on T. To reduce this complexity, we need to specify and limit the trace of T.
Lemma 2
A positive Hermitian matrix \(T_0\) with rank r exists that satisfies \({\text {Tr}}(T_0) \le 2rn + r\log \left( \frac{1}{\varepsilon }\right) \) such that
$$\begin{aligned} \left| S(\rho ) - f(T_0)\right| < \varepsilon , \end{aligned}$$
(9)
where \(\rho \) is an r-rank density matrix.
Proof
See the details of the proof in “Appendix A”. \(\square \)
Proposition 2
(Quantum Donsker-Varadhan Representation) Let \(f: H^{d \times d} \rightarrow \mathbb {R}\) be a function defined on d-dimensional Hermitian matrices, and let \(\rho \) be an r-rank density matrix.
$$\begin{aligned} g(T) =-{\text {Tr}}\left( c \rho T\right) + \log \left( {\text {Tr}}(e^{cT})\right) , \end{aligned}$$
(10)
where \(c \ge 2rn + r\log \left( \frac{1}{\varepsilon }\right) \). Then,
$$\begin{aligned} \left| S(\rho ) - \inf (g(T))\right| < \varepsilon \end{aligned}$$
(11)
for any d-dimensional r-rank density matrix T.
Proof
By using Lemmas 1 and 2, there exists an r-rank positive Hermitian matrix \(T_0\) with \(\text {Tr}(T_0) = c\) such that \(|S(\rho ) - f(T_0)| < \varepsilon \). Thus, \(T_1 = \frac{T_0}{c}\) is an r-rank density matrix, and \(|S(\rho ) - g(T_1)| < \varepsilon \). From Theorem 1, \(S(\rho ) \le g(T)\) for all density matrices T. Therefore, \(\left| S(\rho ) - \inf (g(T))\right| \le \left| S(\rho ) - g(T_1)\right| < \varepsilon \). This completes the proof. \(\square \)
According to the quantum Donsker-Varadhan representation in Proposition 2, we only need to search within the space of the density matrices. By calculating g(T) with an error of \(\varepsilon \), the complexity of copying \(\rho \) is \(O(\frac{c^2}{\varepsilon ^2})\). Next, we plan to determine the optimal density matrix T that minimizes g(T). In the next section, we will use quantum neural networks to determine the optimal T.

4 Von Neumann entropy estimation with quantum neural networks

We now explain the estimation of von Neumann entropy using quantum neural networks, specifically focusing on parameterized quantum circuits as an example. Our approach is inspired by the work of Liu et al. [32], who utilized variational autoregressive networks and quantum circuits to address problems in quantum statistical mechanics. To achieve this, specific values are assigned to the variables in T by defining t as a set of real numbers, \(\{t_i | t_i \in \mathbb {R}\}\) and \(\left| \psi _i\right\rangle \) as complex vectors in \(\mathbb {C}^d\). Additionally, let us assume that the rank of \(\rho \) is denoted by r, and we define \(T = \sum ^r_{i=1} t_i \left| \psi _i\right\rangle \left\langle \psi _i\right| \).
Consequently, the function g(T) becomes \(g(T) = -c\sum ^r_{i=1} t_i \left\langle \psi _i\right| \rho \left| \psi _i\right\rangle + \log \left( d-r+\sum ^r_{i=1} e^{ct_i}\right) \). We can introduce a unitary operator U that transforms \(\left| \psi _i\right\rangle \) into \(\left| i\right\rangle \), and represent this unitary operator using a set of parameters \(\theta \) as \(U(\theta )\) as follows:
$$\begin{aligned} g(T) = -c\sum ^r_{i=1} t_i \left\langle i\right| U(\theta )\rho U^{\dagger }(\theta )\left| i\right\rangle + \log \left( d-r+\sum ^r_{i=1} e^{ct_i}\right) . \end{aligned}$$
(12)
By considering \(U(\theta )\) as a quantum neural network and \(\rho \) as its input, we can obtain the network output by computing \(U(\theta )\rho U^{\dagger }(\theta )\). To accurately calculate g(T) with an error rate less than \(\varepsilon \), it is necessary to measure the output of the quantum neural network \(O\left( \frac{\text {Var}({ct_i})^2}{\varepsilon ^{2}}\right) = O\left( \frac{c^2}{\varepsilon ^{2}}\right) \) times.
Our objective was to optimize the parameters to determine the infimum of g(T). For example, let us consider a parameterized quantum circuit [22] with Pauli gates as a quantum neural network \(U(\theta ) = \prod ^k_{i=1} U(\theta _i)\), where \(U(\theta _i) = e^{-i\frac{\theta _i}{2}P_i}\). By applying the parameter shift rule [28], we observe that
$$\begin{aligned} \nabla _{\theta }g(t, \theta ) = \frac{1}{2}\left[ g\left( t, \theta +\frac{\pi }{2}\right) -g\left( t, \theta -\frac{\pi }{2}\right) \right] , \end{aligned}$$
(13)
and
$$\begin{aligned} \frac{\partial g(t, \theta )}{\partial t_i} = -c\left\langle i\right| U(\theta )\rho U^{\dagger }(\theta )\left| i\right\rangle + \frac{ce^{ct_i}}{d-r+\sum ^r_{i=1} e^{ct_i}}. \end{aligned}$$
(14)
To satisfy the conditions \(t_i \ge 0\) and \(\sum ^r_{i=1} t_i = 1\), we choose \(t_i = \left( \prod ^{i-1}_{j=1} \sin ^2\varphi _j\right) \left( \cos ^2\varphi _i\right) \). We can apply gradient descent to \(\varphi _j\) and \(\theta _i\) to optimize the quantum circuit. To calculate the gradient, we require \(O\big (\frac{c^2}{\varepsilon ^{2}} \times (\# \text {of parameters in QNN})\big )\) copies of \(\rho \). Therefore, to obtain \(\inf \left( g\left( T\right) \right) \) and estimate \(S\left( \rho \right) \) with an error of less than \(\varepsilon \), we require
$$\begin{aligned} O&\left( \frac{c^2}{\varepsilon ^{2}} \times \left( \# \text { of parameters in QNN}\right) \times \left( \# \text { of trainings in QNN}\right) \right) \nonumber \\&= O\left( \frac{r^2}{\varepsilon ^2}\left( n^2+\log ^2\left( \frac{1}{\varepsilon }\right) \right) n_\text {params}n_\text {train}\right) \end{aligned}$$
(15)
copies of \(\rho \).
Analytic gradient measurements in convex loss functions require \(O(\frac{n^3}{\varepsilon ^2})\) copies of \(\rho \) to converge to a solution with \(O(\varepsilon )\) close to the optimum [33]. In general, situations that involve parameterized quantum circuits may have nonconvex loss functions, but many algorithms still utilize parameterized quantum circuits and achieve quantum speedups. We anticipate that quantum speedup can be achieved by employing parameterized quantum circuits with analytic gradient measurements in QMINE and estimating the von Neumann entropy using \(O(\text {poly}(r))\) copies of \(\rho \). In future research, we will investigate the relationships between \(n_\text {train}\) and \(n_\text {params}\), and the performance of this approach. The key point is to transform the quantum mutual information estimation problem into a quantum neural network problem.

5 Numerical simulations

We demonstrated the performance of QMINE in estimating the quantum mutual information of random density matrices through numerical simulations of a quantum circuit. Our goal is to show that QMINE can estimate quantum mutual information with low error. We also analyze the rank and trainable parameters, and conducted simulations to support the results on QDVR.

5.1 Rank analysis

Based on QDVR, we establish that if the rank of the density matrix \(\rho \) is r, then setting the rank of the parameter matrix T to r is sufficient. Thus, we aim to determine the optimal T that estimates the von Neumann entropy. To investigate the effect of rank, we experimented with the rank of T by letting \(r=\text {rank}(\rho )\) and \(k=\text {rank}(T)\). In this analysis, we simulate the scenario with \(N=5, D=30, r=8\), and \(c \le 80\), where N is the number of qubits, D is the circuit depth, r is the rank of the density matrix, c is calculated using QDVR (details are provided in “Appendix B”). Figure 2 shows that when \(k \ge r\), the result of QMINE converges to the correct value, whereas when \(k < r\), it converges at a high error rate. This phenomenon has also been observed in other cases. These results support the QDVR’s claim that the rank of the optimal solution T is r. Because convergence is faster when \(k=r\) than when \(k>r\), it is best to use QMINE with \(k=r\).

5.2 Number of trainable parameters on quantum circuit analysis

We analyzed the performance of QMINE by varying the depth of the quantum circuit. In our simulations, we used \(N=5\), \(D=30\), \(r=k=8\), and \(c \le 80\). The experimental results confirmed that as the depth of the circuit and the number of parameters increased, the estimation accuracy of QMINE improved. Figure 3 illustrates the results, showing that a circuit depth of 20 achieved the best performance. It converged rapidly with a lower error compared to a depth of 30, which converged at a slower rate despite having a similar error. These findings emphasize the importance of choosing an appropriate circuit depth (i.e., number of parameters) in QMINE. The copy complexity is determined by the number of parameters (\(n_\textrm{params}\)) and the number of training iterations (\(n_\text {train}\)). Therefore, when applying QMINE in various situations, it is crucial to select the correct circuit depth. We plan to investigate this aspect in future studies.

5.3 Estimating quantum mutual information

We estimated the quantum mutual information of a random density matrix using simulations with \(N=4\) qubits. For each tested random density matrix, we achieved error rates ranging from \(0.1\%\) to \(1\%\). Additional details can be found in “Appendix B” (Fig. 4).

6 Conclusions

We have addressed the quantum Donsker-Varadhan representation, which is a mathematical framework for estimating von Neumann entropy. The QDVR allows us to find the optimal T by searching only within the density matrices, resulting in low copy complexity for calculations. By optimizing the quantum neural network using QDVR and the parameter shift rule, we can estimate von Neumann entropy and subsequently estimate the quantum mutual information. The number of copies of \(\rho \) required is approximately \(O\left( \frac{r^2}{\varepsilon ^2}\left( n^2+\log ^2\left( \frac{1}{\varepsilon }\right) \right) n_\text {params}\cdot n_\text {train}\right) \).
Through the numerical simulations, we demonstrated that the quantum mutual information neural estimation (QMINE) performs well, and it aligns with the results of quantum Donsker-Varadhan representation. The rank analysis supported the results of QDVR, whereas the circuit depth analysis emphasized the importance of selecting an appropriate circuit depth. In addition, we estimated the quantum mutual information and achieved a low error rate. The key finding of this study is the conversion of the quantum mutual information and von Neumann entropy estimation problem into a quantum neural network problem. In future, we suggest investigating the specifics of \(n_\text {params}\) and \(n_\text {train}\) pertaining to the quantum neural network problem. This will be explored in future studies.

Acknowledgements

This work was supported by the National Research Foundation of Korea (NRF) through grants funded by the Ministry of Science and ICT (NRF-2022M3H3A1098237) and the Ministry of Education (NRF-2021R1I1A1A01042199). This work was partially supported by an Institute for Information & Communications Technology Promotion (IITP) grant funded by the Korean government (MSIP) (No. 2019-0-00003; Research and Development of Core Technologies for Programming, Running, Implementing, and Validating of Fault-Tolerant Quantum Computing Systems).

Declarations

Conflict of interest

The authors have no conflicts to disclose.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix

Proof of Lemma 2

Here, we provide an explicit proof of Lemma 2 and details of the numerical simulation results.
For given \(\rho = \sum ^r_{i=1} p_i | \psi _i \rangle \!\langle \psi _i |\), let us define \(T_0 = \sum ^r_{i=1} t_i| \psi _i \rangle \!\langle \psi _i |\) with \(t_i = {\left\{ \begin{array}{ll} \log (\frac{p_i}{k}), &{} {\text {if}}\;\;p_i \ge k; \\ 0, &{} {\text {if}}\;\;p_i < k \end{array}\right. }\) and \(k=\frac{\varepsilon }{d^2}\). Then, the bound on the value of \(\left| S\left( \rho \right) -f\left( T_0\right) \right| \) can be derived as:
$$\begin{aligned} \left| S\left( \rho \right) - f\left( T_0\right) \right|&= \left| S\left( \rho \right) + {\text {Tr}}\left( \rho T_0\right) - \log \left( {\text {Tr}}(e^{T_0})\right) \right| \\&= \left| \sum ^r_{i=1} p_i\log \left( \frac{1}{p_i}\right) + \sum _{p_i \ge k} p_i\log \left( \frac{p_i}{k}\right) - \log \left( \sum _{p_i<k} 1 + \sum _{p_i \ge k} \frac{p_i}{k}\right) \right| \\&= \left| \sum _{p_i< k} p_i\log \left( \frac{1}{p_i}\right) + \sum _{p_i \ge k} p_i\log \left( \frac{1}{k}\right) - \log \left( \sum _{p_i<k} 1 + \sum _{p_i \ge k} \frac{p_i}{k}\right) \right| \\&= \left| \sum _{p_i< k} p_i\log \left( \frac{1}{p_i}\right) + \sum _{p_i \ge k} p_i\log \left( \frac{1}{k}\right) \right. \\&\quad \left. -\log \left( \frac{1}{k}\right) +\log \left( \frac{1}{k}\right) -\log \left( \sum _{p_i<k} 1 + \sum _{p_i \ge k} \frac{p_i}{k}\right) \right| \\&= \left| \sum _{p_i< k} p_i\log \left( \frac{k}{p_i}\right) - \log \left( \sum _{p_i<k} k + \sum _{p_i \ge k} p_i\right) \right| \\&= \left| \sum _{p_i< k} p_i\log \left( \frac{1}{p_i}\right) + \sum _{p_i< k} p_i\log \left( \frac{1}{k}\right) + \log \left( 1 + \sum _{p_i<k} \left( k-p_i\right) \right) \right| \\&\le 2dk\log \left( \frac{1}{k}\right) + dk = \frac{2\varepsilon ^2\left( 2\log d + \log \left( \frac{1}{\varepsilon }\right) \right) }{d} + \frac{\varepsilon }{d} \\&< \varepsilon . \end{aligned}$$
That is, \(\left| S(\rho )-f(T_0)\right| < \varepsilon \). Finally, \({\text {Tr}}\left( T_0\right) \) is estimated as
$$\begin{aligned} {\text {Tr}}\left( T_0\right)&= \sum _{p_i \ge k} \log \left( \frac{p_i}{k}\right) \le \sum _{p_i \ge k} \log \left( \frac{1}{k}\right) \\&\le r\log \left( \frac{1}{k}\right) = 2r\log d + r\log \left( \frac{1}{\varepsilon }\right) \\&= 2rn+r\log \left( \frac{1}{\varepsilon }\right) . \end{aligned}$$
This implies that there exists a positive Hermitian matrix \(T_0\) such that \({\text {Tr}}(T_0) = O(rn+r\log \left( \frac{1}{\varepsilon }\right) )\) and \(\left| S\left( \rho \right) -f\left( T_0\right) \right| < \varepsilon \).                                                             \(\square \)

Details on numerical simulations

To support our observations, we explain the details of the numerical simulations for estimating the quantum mutual information, which can be expressed as the sum of von Neumann entropies as follows:
$$\begin{aligned} I\left( A:B\right)&= S(\rho ^A) + S(\rho ^B) - S(\rho ^{AB}) \\&= S\left( \rho ^A\otimes \rho ^B\right) - S(\rho ^{AB}). \end{aligned}$$
To obtain quantum mutual information, we adopted an alternative and simple strategy. By exploiting QMINE (suggested in Sect. 4), we directly estimate \(S(\rho ^A\otimes \rho ^B)\) and \(S(\rho ^{AB})\). That is, we address \(S(\rho ^A\otimes \rho ^B)\) rather than estimating \(S(\rho ^{A})\) or \(S(\rho ^{B})\). This method reduces the number of resource copies required for simulations.
We used four-qubit for this simulation and the results of our experiment are summarized in Table 1. To show that QMINE can estimate the quantum mutual information for various density matrices, we present the results of the estimation, where the rank of \(\rho _{AB}\) is different.
Table 1
Estimations of quantum mutual information using the QMINE method
Rank
QMI
Estimation results
Error-rate (%)
1
1.8048002
1.7946120
0.565
2
1.7631968
1.7493981
0.783
4
1.3031208
1.2902124
0.991
8
0.8440226
0.8376048
0.760
16
0.4172888
0.4163618
0.222
Literature
1.
go back to reference Jaeger, G.: Quantum Information: An Overview. Springer, New York (2007) Jaeger, G.: Quantum Information: An Overview. Springer, New York (2007)
2.
go back to reference Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000) Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information. Cambridge University Press, Cambridge (2000)
3.
go back to reference Wilde, M.M.: Quantum Information Theory. Cambridge University Press, Cambridge (2017) Wilde, M.M.: Quantum Information Theory. Cambridge University Press, Cambridge (2017)
6.
go back to reference Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549, 195 (2017)ADSCrossRefPubMed Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549, 195 (2017)ADSCrossRefPubMed
7.
go back to reference Carleo, G., Cirac, I., Cranmer, K., Daudet, L., Schuld, M., Tishby, N., Vogt-Maranto, L., Zdeborová, L.: Machine learning and the physical sciences. Rev. Mod. Phys. 91, 045002 (2019)ADSCrossRef Carleo, G., Cirac, I., Cranmer, K., Daudet, L., Schuld, M., Tishby, N., Vogt-Maranto, L., Zdeborová, L.: Machine learning and the physical sciences. Rev. Mod. Phys. 91, 045002 (2019)ADSCrossRef
8.
go back to reference Bengtsson, I., Życzkowski, K.: Geometry of quantum states: an introduction to quantum entanglement. Cambridge University Press, Cambridge (2017)CrossRef Bengtsson, I., Życzkowski, K.: Geometry of quantum states: an introduction to quantum entanglement. Cambridge University Press, Cambridge (2017)CrossRef
9.
go back to reference O’Donnell R., Wright J.: Efficient quantum tomography. In: Proceeding of the 48th Annual ACM Symposium on Theory of Computing, pp. 899–912 (2016) O’Donnell R., Wright J.: Efficient quantum tomography. In: Proceeding of the 48th Annual ACM Symposium on Theory of Computing, pp. 899–912 (2016)
10.
go back to reference Hastings, M.B., González, I., Kallin, A.B., Melko, R.G.: Measuring Renyi entanglement entropy in quantum Monte Carlo simulations. Phys. Rev. Lett. 104, 157201 (2010)ADSCrossRefPubMed Hastings, M.B., González, I., Kallin, A.B., Melko, R.G.: Measuring Renyi entanglement entropy in quantum Monte Carlo simulations. Phys. Rev. Lett. 104, 157201 (2010)ADSCrossRefPubMed
11.
go back to reference Calabrese, P., Cardy, J., Doyon, B.: Entanglement entropy in extended quantum systems. J. Phys. A: Math. Theor. 42, 500301 (2009)MathSciNetCrossRef Calabrese, P., Cardy, J., Doyon, B.: Entanglement entropy in extended quantum systems. J. Phys. A: Math. Theor. 42, 500301 (2009)MathSciNetCrossRef
13.
go back to reference Acharya, J., Issa, I., Shende, N.V., Wagner, A.B.: Estimating quantum entropy. IEEE J. Select. Areas Inf. Theory 1, 454 (2020)CrossRef Acharya, J., Issa, I., Shende, N.V., Wagner, A.B.: Estimating quantum entropy. IEEE J. Select. Areas Inf. Theory 1, 454 (2020)CrossRef
14.
go back to reference Tan, K.C., Volkoff, T.: Variational quantum algorithms to estimate rank, quantum entropies, fidelity, and fisher information via purity minimization. Phys. Rev. Res. 3, 033251 (2021)CrossRef Tan, K.C., Volkoff, T.: Variational quantum algorithms to estimate rank, quantum entropies, fidelity, and fisher information via purity minimization. Phys. Rev. Res. 3, 033251 (2021)CrossRef
16.
18.
go back to reference Subramanian, S., Hsieh, M.-H.: Quantum algorithm for estimating \(\alpha \)-Renyi entropies of quantum states. Phys. Rev. A 104, 022428 (2021)ADSMathSciNetCrossRef Subramanian, S., Hsieh, M.-H.: Quantum algorithm for estimating \(\alpha \)-Renyi entropies of quantum states. Phys. Rev. A 104, 022428 (2021)ADSMathSciNetCrossRef
19.
go back to reference Von Neumann, J.: Mathematische grundlagen der quantenmechanik. Springer, New York (1996)CrossRef Von Neumann, J.: Mathematische grundlagen der quantenmechanik. Springer, New York (1996)CrossRef
20.
21.
go back to reference Wan, K.H., Dahlsten, O., Kristjánsson, H., Gardner, R., Kim, M.S.: Quantum generalisation of feedforward neural networks. NPJ Quantum Inf. 3, 36 (2017)ADSCrossRef Wan, K.H., Dahlsten, O., Kristjánsson, H., Gardner, R., Kim, M.S.: Quantum generalisation of feedforward neural networks. NPJ Quantum Inf. 3, 36 (2017)ADSCrossRef
22.
go back to reference Benedetti, M., Lloyd, E., Sack, S., Fiorentini, M.: Parameterized quantum circuits as machine learning models. Quantum Sci. Technol. 4, 043001 (2019)ADSCrossRef Benedetti, M., Lloyd, E., Sack, S., Fiorentini, M.: Parameterized quantum circuits as machine learning models. Quantum Sci. Technol. 4, 043001 (2019)ADSCrossRef
23.
25.
go back to reference Shannon C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379 & 623 (1948) Shannon C.E.: A mathematical theory of communication. Bell Syst. Tech. J. 27, 379 & 623 (1948)
26.
go back to reference Belghazi M.I., Baratin A., Rajeswar S., Ozair S., Bengio Y., Courville A., Hjelm R.D., MINE: Mutual Information Neural Estimation. arXiv:1801.04062 Belghazi M.I., Baratin A., Rajeswar S., Ozair S., Bengio Y., Courville A., Hjelm R.D., MINE: Mutual Information Neural Estimation. arXiv:​1801.​04062
27.
go back to reference Donsker, M.D., Varadhan, S.R.S.: Asymptotic evaluation of certain Markov process expectations for large time–III. Commun. Pure Appl. Math. 29, 389 (1976)MathSciNetCrossRef Donsker, M.D., Varadhan, S.R.S.: Asymptotic evaluation of certain Markov process expectations for large time–III. Commun. Pure Appl. Math. 29, 389 (1976)MathSciNetCrossRef
28.
go back to reference Mitarai, K., Negoro, M., Kitagawa, M., Fujii, K.: Quantum circuit learning. Phys. Rev. A 98, 032309 (2018)ADSCrossRef Mitarai, K., Negoro, M., Kitagawa, M., Fujii, K.: Quantum circuit learning. Phys. Rev. A 98, 032309 (2018)ADSCrossRef
29.
go back to reference Rebentrost, P., Mohseni, M., Lloyd, S.: Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503 (2014)ADSCrossRefPubMed Rebentrost, P., Mohseni, M., Lloyd, S.: Quantum support vector machine for big data classification. Phys. Rev. Lett. 113, 130503 (2014)ADSCrossRefPubMed
30.
go back to reference Havlíček, V., Córcoles, A.D., Temme, K., Harrow, A.W., Kandala, A., Chow, J.M., Gambetta, J.M.: Supervised learning with quantum-enhanced feature spaces. Nature 567, 209 (2019)ADSCrossRefPubMed Havlíček, V., Córcoles, A.D., Temme, K., Harrow, A.W., Kandala, A., Chow, J.M., Gambetta, J.M.: Supervised learning with quantum-enhanced feature spaces. Nature 567, 209 (2019)ADSCrossRefPubMed
31.
go back to reference Huber A.: Variational Principles in Quantum Statistical Mechanics. Mathematical Methods in Solid State and Superfluid Theory: Scottish Universities Summer School, pp. 364–392 (1968) Huber A.: Variational Principles in Quantum Statistical Mechanics. Mathematical Methods in Solid State and Superfluid Theory: Scottish Universities Summer School, pp. 364–392 (1968)
32.
go back to reference Liu, J.-G., Mao, L., Zhang, P., Wang, L.: Solving quantum statistical mechanics with variational autoregressive networks and quantum circuits. Mach. Learn. Sci. Technol. 2, 025011 (2021)CrossRef Liu, J.-G., Mao, L., Zhang, P., Wang, L.: Solving quantum statistical mechanics with variational autoregressive networks and quantum circuits. Mach. Learn. Sci. Technol. 2, 025011 (2021)CrossRef
33.
go back to reference Harrow, A.W., Napp, J.C.: Low-depth gradient measurements can improve convergence in variational hybrid quantum-classical algorithms. Phys. Rev. Lett. 126, 140502 (2021)ADSCrossRefPubMed Harrow, A.W., Napp, J.C.: Low-depth gradient measurements can improve convergence in variational hybrid quantum-classical algorithms. Phys. Rev. Lett. 126, 140502 (2021)ADSCrossRefPubMed
Metadata
Title
Estimating quantum mutual information through a quantum neural network
Authors
Myeongjin Shin
Junseo Lee
Kabgyun Jeong
Publication date
01-02-2024
Publisher
Springer US
Published in
Quantum Information Processing / Issue 2/2024
Print ISSN: 1570-0755
Electronic ISSN: 1573-1332
DOI
https://doi.org/10.1007/s11128-023-04253-1

Other articles of this Issue 2/2024

Quantum Information Processing 2/2024 Go to the issue