Skip to main content
Top

Approximate real-time evolution operator for potential with one ancillary qubit and application to first-quantized Hamiltonian simulation

  • Open Access
  • 01-03-2025
Published in:

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

The article delves into the efficient implementation of unitary diagonal matrices, which are fundamental for simulating first-quantized Hamiltonians and various quantum algorithms. It summarizes previous methods, such as sequency-ordered Walsh operators and arithmetic operations with phase kickback, and introduces novel techniques like the linearly interpolated unitary diagonal matrix (LIU) and a modified piecewise polynomial approximation (PPP). The study provides explicit quantum circuit implementations for these methods, evaluating their gate counts and circuit depths in relation to grid parameters and desired precision. Notably, the article compares these methods through case studies involving modified Coulomb potentials and damped oscillating functions, offering insights into their practical performance. Furthermore, it applies these methods to first-quantized Hamiltonian simulations, estimating the required quantum resources and highlighting the significance of approximation errors. The findings underscore the importance of considering both Trotter-Suzuki and approximation errors in quantum simulations, providing a strategic approach to selecting the optimal quantum circuit for given potential functions and error bounds.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

1 Introduction

We investigate the efficient quantum circuits for unitary diagonal matrices, which can also be regarded as real-time evolution operators of diagonal matrices. Such quantum circuits play a fundamental role in the simulation of first-quantized Hamiltonian [15] as well as subroutines of many quantum circuits/algorithms, e.g., ground state preparation by using the imaginary-time evolution [610]. In this paper, considering the application to Hamiltonian simulations, we assume that the entries take values of some given potential function at \(N=2^n\) (\(n\in \mathbb {N}\)) uniform grid points. The implementation of a given unitary diagonal matrix is no longer an easy task when n becomes large. In the following context, we summarize several methods in previous papers.
Exact implementations Based on the gate set \(\{\textrm{CNOT}, R_z\}\), it is well-known that such a diagonal unitary operation on n qubits can be precisely implemented using \(2^n\) multifold z rotation gates, corresponding to Walsh operators, whose rotation angles are proportional to the Hadamard transform of the given diagonal components [1113]. Later, Welch et al. [14] gave an optimal circuit in CNOT count using sequency order based on Gray code [15, 16], and Zhang et al. [17] further reduced the circuit depth by commuting and parallelizing quantum gates in a collective manner. Moreover, the circuit for the diagonal unitary matrix with reflection symmetry was discussed in [18], and another half reduction was achieved due to the symmetry. Although such optimal circuits reduce the depth up to a factor of 2 or 4 without any ancillary qubit, the total gate count/circuit depth remains the order of \(\mathcal {O}(2^n)\), which is still a tough task for large n. As for a phase-sparse diagonal unitary matrix, the authors in [19, 20] showed that such exponential growth could be reduced to \(\mathcal {O}(k n^2)\) where k is the number of distinct values in the diagonal.
Piecewise polynomial approximations Based on multi-controlled phase gates and quantum comparators, there is a way to approximate the given potential function by a piecewise polynomial and directly load the approximate function in the phase factor. Although this idea has already been proposed in the first decade of this century by Benenti and Strini [21], it is not intensively discussed to the authors’ best knowledge. Owing to the polynomial approximation, the total gate count is a polynomial of n. Moreover, only one ancillary qubit is needed for storing the results of quantum comparators. Recent works discussed the application of such polynomial phase gates to first-quantized Hamiltonian simulation [22, 23], while the detailed gate count under a given error bound was not discussed. An alternative implementation of the piecewise polynomial approximation in the phase factor is to introduce several ancilla registers to store the function and intermediate calculations, compute the desired function in an ancilla register, and then kick back the values in the register to the phase factor. In [5, 24], the authors suggested using arithmetic operations to calculate the approximate polynomial functions in the ancilla register. Recently, the papers [2527] provided detailed circuits on the loading of polynomial coefficients in ancilla registers by quantum read-only memory (QROM) [25] and on the parallel implementation of the polynomials by introducing a label/selection register. Compared to the above polynomial phase gate, this method requires more ancillary qubits to conduct the arithmetic operations and store the intermediate calculations. On the other hand, it may be slightly more efficient in gate count.
This paper is devoted to the explicit quantum circuits of several different methods, including a modified version of the aforementioned piecewise polynomial approximation and a novel one based on a linearly interpolated unitary diagonal matrix. Our main contributions are listed as follows:
  • In Sect. 2, we restate the previous methods and propose an original method and a modified one. These methods can be summarized in the framework of piecewise polynomial approximations with polynomial degrees of 0 (piecewise constant), 1 (piecewise linear), and p (piecewise higher degree \(p\ge 2\)). By introducing a coarse-graining parameter, analytically calculated from the given potential function and the desired precision (i.e., the error bound), we provide the detailed quantum circuit implementations of each methods in “Appendix A.”
  • In Sect. 3, with the help of the coarse-graining parameter, we first evaluate the explicit gate counts for different methods as functions of some norms of the potential function, the grid parameter and the desired precision. It clarifies quantitatively how the desired precision and the function to be implemented would influence the required quantum resources. Then, owing to the analytical evaluations, we compare different methods concerning both the asymptotic gate counts and the explicit gate counts under several practical settings of potential functions and desired precisions. We find that the method with the best asymptotic gate count may not be optimal in practical settings since the prefactor of the gate count can be dominant when the grid parameter and the desired error bound are specified according to the applications. Therefore, we provide a strategy to figure out the best quantum circuit among the considered ones when the potential function and the error bound are given.
  • In Sect. 4, we employ the grid-based method for the first-quantized Hamiltonian simulation. The quantum circuit consists of shifted quantum Fourier transforms and diagonal unitary matrices. Applying the above discussions, we estimate the required quantum resources for the Hamiltonian simulation of a particle system with respect to the space dimension, particle numbers, simulation time and error bound (see Table 3). It shows the error coming from the approximation of the potential function has a similar contribution as the Trotter-Suzuki error. Moreover, we provide two novel techniques: (a) achieving linear depth regarding the particle number for two-particle interaction potentials without using additional dummy registers; (b) implementing distance register using quantum Fourier transforms; to minimize the use of ancillary qubits. We believe such techniques are compatible with the early fault-tolerant quantum computers for which the number of logical qubits is limited.

2 Methods

Let \(N=2^n\) for a given grid parameter \(n\in \mathbb {N}\). For \(L>0\), we introduce a uniform division of the interval [0, L] as \(\Pi \): \(0 = x_0< x_1< \cdots < x_N = L\), where \(x_j = jL/N\), \(j=0,1,\ldots ,N\). For a given sufficiently smooth function V, we aim at approximate circuits to implement the following unitary diagonal matrix:
$$\begin{aligned} e^{-\textrm{i}V_N\Delta \tau } = \sum _{j=0}^{N-1}e^{-\textrm{i}v_j\Delta \tau } {|{j}\rangle }{\langle {j}|}, \end{aligned}$$
where \(v_j = V(x_j)\), \(j=0,1,\ldots ,N-1\). As we can put the time step factor \(\Delta \tau \) into \(V_N\), without loss of generality, here we take \(\Delta \tau =1\) for simplicity. In the following contexts, we review previous methods and provide a new method for the above purpose.

2.1 Sequency-ordered Walsh operators (WAL)

Under the gate set \(\{\textrm{CNOT}, R_z\}\), the exact implementation of the unitary diagonal matrices can be done by using sequency-ordered Walsh operators [1114, 17]. The optimized circuit in this direction is given by [17] (see “Appendix A.1” for more details). The gate count/depth is scaled as \(2^n\) with relatively small pre-factor 1 for the depth and pre-factor 2 for the gate count, which indicates that such a method is a good choice for small grid parameter n [28]. For a large grid parameter n, as suggested by Welch et al. in [14], one can choose a coarse-graining parameter \(m_0<n\) and apply the sequency-ordered Walsh operators only to the top \(m_0\) qubits to derive an approximate operation of the original unitary operation. If we denote the circuit for the unitary diagonal operation on k qubits in [17] by \(U_{\text {D}}(V,k)\), then the quantum circuit of WAL is described in Fig. 1, where \(m=m_0\) depends on the potential function and the precision. For the suitable choice of \(m_0\), we refer to “Appendix A.1.” Intuitively, if n is strictly larger than \(m_0\), then this method yields a quantum circuit corresponding to a uniform piecewise constant approximation of the given potential function.
Fig. 1
Quantum circuit of WAL based on sequency-ordered Walsh operators. Throughout this paper, we locate the top (most significant) qubits in the lower part of the circuits
Full size image

2.2 Linearly interpolated unitary diagonal matrix (LIU)

We propose a novel method by constructing a linearly interpolated unitary diagonal matrix from an original unitary diagonal matrix. Intuitively, the idea is to obtain a circuit for a uniform piecewise linear approximation. In other words, we first calculate the values at \(2^{m_1}\) (\(m_1<n\)) coarse grid points and then apply linear interpolation at fine grid points. Let \(1\le m_1<n\) be an integer describing the parameter of a coarse division and let \(M=2^{m_1}\). We assume the access to a diagonal unitary operation and its fractional binary power:
$$\begin{aligned} U_M := \sum _{k=0}^{M-1} \exp \left( -\textrm{i}u_{k}\right) {|{k}\rangle }{\langle {k}|}, \ U_M^{(j)} := \sum _{k=0}^{M-1} \exp \left( -\textrm{i}u_{k}/2^j\right) {|{k}\rangle }{\langle {k}|}, \ j=0,1,\ldots , n-m_1, \end{aligned}$$
where we set \(u_k = v_{kN/M}\), \(k=0,1,\ldots ,M-1\). In a general case, the fractional binary power of a given diagonal unitary operation can be derived using the quantum circuit in [29]. By applying an increment gate defined by
$$\begin{aligned} U_{\text {+1}} {|{k}\rangle }_{m_1} := {|{k+1}\rangle }_{m_1},\ k=0,1,\ldots ,M-1, \end{aligned}$$
we introduce a phase difference diagonal operation \(W_{j}\):
$$\begin{aligned} W_{j} {|{k}\rangle }_{m_1} = U_{\text {+1}}^\dag U_M^{(j)} U_{\text {+1}} U_M^{(j)\dag } {|{k}\rangle }_{m_1} = e^{-\textrm{i}(u_{k+1}-u_{k})/2^{j}} {|{k}\rangle }_{m_1}, \end{aligned}$$
for \(k=0,1,\ldots ,M-1,\, j = 0,1,\ldots , n-m_1\). Here, we set \(u_{M}:= u_0\).
Fig. 2
a Proposed quantum circuit of LIU based on a linearly interpolated diagonal unitary matrix. The controlled \(W_j\) gate with control qubit \({|{k_p^\prime }\rangle }\) and target register \({|{k}\rangle }_m\) is denoted by \(\textrm{C}W_j^{(p)}\). b Quantum circuit of \(\textrm{C}W_j^{(p)}\) using two \(U_M^{(j)}\), a controlled increment gate \(U_{+1}\), and its inverse
Full size image
By denoting \({|{k^\prime }\rangle }_{n-m_1} = {|{k_{n-m_1-1}^\prime }\rangle }\cdots {|{k_{0}^\prime }\rangle }\) and \({|{j}\rangle }_n = {|{k}\rangle }_{m_1} \otimes {|{k^\prime }\rangle }_{n-m_1}\) for \(j=0,1,\ldots , N-1\), we employ the controlled \(W_j\) gate \(\text {C}W_j^{(p)}\), \(p=0,1,\ldots ,n-m_1-1\) whose upper index indicates the control qubit \({|{k_p^\prime }\rangle }\) (see Fig. 2b) and the diagonal unitary operation \(U_M^{(j)}\) to obtain
$$\begin{aligned} U_N{|{j}\rangle }_n&= \textrm{C}W_1^{(n-m_1-1)} \cdots \textrm{C}W_{n-m_1-1}^{(1)} \textrm{C}W_{n-m_1}^{(0)}(U_M\otimes I^{\otimes (n-m_1)}) \left( {|{k}\rangle }_{m_1} \otimes {|{k^\prime }\rangle }_{n-m_1}\right) \\ &= \textrm{C}W_1^{(n-m_1-1)} \cdots \textrm{C}W_{n-m_1-1}^{(1)} \textrm{C}W_{n-m_1}^{(0)} \left( \textrm{exp}\left( -\textrm{i} u_{k}\right) {|{k}\rangle }_{m_1} \otimes {|{k^\prime }\rangle }_{n-m_1}\right) \\ &= \exp \left( -\textrm{i}\left( u_{k}+\sum _{p=0}^{n-m_1-1}k_{p}^\prime 2^{p}(u_{k+1}-u_{k})/2^{n-m_1}\right) \right) \left( {|{k}\rangle }_{m_1} \otimes {|{k^\prime }\rangle }_{n-m_1}\right) \\ &= e^{-\textrm{i}{\tilde{v}}_j} {|{j}\rangle }_n. \end{aligned}$$
Here, we set \({\tilde{v}}_j:= u_k + k^\prime (u_{k+1}-u_k)/2^{n-m_1}\), which indicates that the circuit in Fig. 2 yields a diagonal unitary operation on n qubits whose diagonal entries are the linear interpolation of the diagonal entries of \(U_M\). In particular, we find \({\tilde{v}}_{\ell N/M} = u_{\ell } = v_{\ell N/M}\), \(\ell =0,\ldots ,M-1\). Then, \(U_N\) can be interpreted as a linearly interpolated matrix of the original matrix \(U_M\). Since we know every entry \(u_k\) using the underlying function V, we can directly implement the diagonal unitary operation \(U_M^{(j)}\) by \(U_{\text {D}}(V/2^j,m_1)\). The detailed gate implementation and the choice of parameter \(m_1\) are provided in “Appendix A.2.”
Our proposed method LIU works well if the potential function is periodic or localized such that the difference between the values at two endpoints is negligible. For a general function, we do not necessarily have \(u_M = u_0\), which implies that there may be an additional error in the last sub-interval of the division, and we may need extra multi-controlled phase gates to refine the circuit. An alternative way is to extend the function from [0, L] to [0, 2L] by \(V(x):= V(2\,L-x)\) for \(x\in [L,2\,L]\). Then, we derive a periodic function that can be dealt with. To maintain the desired precision, we need one zero-initialized ancillary qubit. In detail, we apply \(U_M^{(j)}\) to the top \(m_1\) qubits and the ancillary qubit, and the controlled increment gates are also targeted at these \(m_1+1\) qubits. The gate count discussed in “Appendix A.2” remains the same formula, but one needs to replace n and \(m_1\) by \(n+1\) and \(m_1+1\), respectively. A third way for modification relies on the direct implementation of the controlled \(W_j\) operation, given by the controlled unitary diagonal operation, which consists of \(2^{m_1}-2\) CNOT gates and \(2^{m_1}-1\) controlled phase gates. We will call this modified method mLIU in the following contexts. To avoid confusion, we mention a quantum interpolation algorithm for the quantum states, which is useful for preparing “finer" initial quantum states [30, 31]. However, such an interpolation circuit does not work for our purpose since it will change the diagonality of the operation.

2.3 Phase gate for piecewise-defined polynomial (PPP)

It is known that any function can be approximated by some suitable piecewise polynomial function for which its division and polynomial coefficients can be derived by classical algorithms. Then, the problem of finding an efficient approximate circuit for the evolution operator of the potential boils down to the efficient implementation of a given piecewise polynomial function in the phase factor. Employing a quantum comparator to distinguish the different sub-intervals of the piecewise polynomial, one solution is to use the polynomial phase gate for a given polynomial function f as follows [21, 23]:
$$\begin{aligned} U_{\text {ph}}[f] = \sum _{j=0}^{N-1} \textrm{exp}\left( -\textrm{i}f(x_j)\right) {|{j}\rangle }{\langle {j}|}. \end{aligned}$$
A single polynomial phase gate is constructed by several multi-controlled phase gates [23] for which we provide more details in “Appendix A.3” for completeness.
Here, we suggest a modified way to determine the piecewise polynomial approximation to a given function, where we introduce a coarse-graining parameter \(m\le n\) and let \(M=2^m\). We choose \(f({\tilde{x}}_{\ell }) = V({\tilde{x}}_{\ell })\) with some given knots \({\tilde{x}}_\ell \in \{x_{kN/M}\}_{k=0}^{M-1}\), \(\ell =0,1,\ldots ,{\tilde{M}}\) and \({\tilde{x}}_0 = x_0\), \({\tilde{x}}_{{\tilde{M}}} = x_N\). Since \({\tilde{M}}+1\) is the number of knots, \({\tilde{M}}\le M\) is the number of polynomials. Then, we can derive a piecewise polynomial function \({\tilde{V}}\) (using the values at these \({\tilde{M}}+1\) knots) by determining the polynomial coefficients with spline interpolation. In “Appendix A.3,” we give an analytic estimation of the parameters \(m,{\tilde{M}}\) so that \(\max _{x\in [0, L]}|{\tilde{V}}(x)-V(x)|\le \delta \) for a given precision \(\delta >0\), and we also provide practical classical algorithms for finding these parameters numerically with possibly varying degrees for each interval. As long as we have prepared the piecewise polynomial approximation:
$$\begin{aligned} {\tilde{V}}(x) = \sum _{\ell =0}^{{\tilde{M}}-1} \chi _{[{\tilde{x}}_{\ell }, {\tilde{x}}_{\ell +1})}(x) f_{\ell }(x), \quad x\in [0,L], \end{aligned}$$
(1)
where \(\chi _{A}\) is the characteristic function whose value is 1 for \(x\in A\) and 0 otherwise, we realize the approximate diagonal operation
$$\begin{aligned} e^{-\textrm{i}{\tilde{V}}_N} = \sum _{j=0}^{N-1} e^{-\textrm{i}{\tilde{V}}(x_j)}{|{j}\rangle }{\langle {j}|}, \end{aligned}$$
by employing phase gates for polynomial functions and quantum comparators as shown in Fig. 3. A quantum comparator is usually regarded as a circuit to compare the integers encoded in two quantum registers [32, 33]. In this article, to minimize the number of qubits, we suggest applying the quantum comparator for a given integer in [34]: \(\text {COMP(k)}{|{0}\rangle }\otimes {|{j}\rangle }:= {|{j< k}\rangle } \otimes {|{j}\rangle }\), which requires only one ancillary qubit for storing the comparison result. We discuss the detailed gate count and depth in “Appendix A.3,” and here we mention that the above circuit uses one polynomial phase gate, \({\tilde{M}}-1\) controlled polynomial phase gates, and \(2({\tilde{M}}-1)\) quantum comparators, including the part of uncomputation so that the ancillary qubit can be reused safely.
Fig. 3
Modified quantum circuit of PPP to implement a real-time evolution operator of an approximate potential by a piecewise polynomial function. Here, one can use the quantum comparator with a given integer \(k_\ell = M{\tilde{x}}_\ell /L\) in [34]. The difference from [22, 23] is that we apply the comparator only on the most significant m qubits of the register, and thus, we reduce the number of gates for the comparators if \(m<n\)
Full size image

2.4 Arithmetic operations and phase kickback (APK)

In the above circuits for WAL, LIU, and PPP, we directly implement the approximation \(e^{-\textrm{i}{\tilde{V}}(x_j)}\) in the phase factor:
$$\begin{aligned} {|{j}\rangle } \mapsto e^{-\textrm{i}{\tilde{V}}(x_j)}{|{j}\rangle }, \quad j=0,1,\ldots , N-1, \end{aligned}$$
using the polynomial phase gates. On the other hand, there is another way [5, 2527] to achieve it by introducing some auxiliary registers and using arithmetic operations. Let the piecewise polynomial approximation \({\tilde{V}}\) be given by Eq. (1). Then, the procedure is as follows:
1.
Introduce and initialize a label/selection register using the system register [26, Figure 1]:
$$\begin{aligned} {|{j}\rangle }{|{0}\rangle } \mapsto {|{j}\rangle }{|{\ell (j)}\rangle }, \end{aligned}$$
where \(\ell (j)\) denotes the index of the interval for each point \(x_j\).
 
2.
Introduce a register for data and input the piecewise polynomial coefficients \(d_\ell \) into the register by quantum read-only memory (QROM) [25, 27]:
$$\begin{aligned} {|{j}\rangle }{|{\ell (j)}\rangle }{|{0}\rangle } \mapsto {|{j}\rangle }{|{\ell (j)}\rangle }{|{d_{\ell (j)}}\rangle }. \end{aligned}$$
 
3.
Introduce a register for the temporary result and calculate the approximate polynomial function for each interval by applying some arithmetic operations (multiplications and additions) conditioned on the system register and the data register [26, 27]:
$$\begin{aligned} {|{j}\rangle }{|{\ell (j)}\rangle }{|{d_{\ell (j)}}\rangle }{|{0}\rangle } \mapsto {|{j}\rangle }{|{\ell (j)}\rangle }{|{d_{\ell (j)}}\rangle }{|{f_{\ell (j)}(x_j)}\rangle } = {|{j}\rangle }{|{\ell (j)}\rangle }{|{d_{\ell (j)}}\rangle }{|{{\tilde{V}}(x_j)}\rangle }. \end{aligned}$$
 
4.
Employ the phase kickback circuit [24], which substitutes phase (rotation) gates by an addition on the result register and a suitably prepared phase kickback register:
$$\begin{aligned} {|{j}\rangle }{|{\ell (j)}\rangle }{|{d_{\ell (j)}}\rangle }{|{{\tilde{V}}(x_j)}\rangle } {|{0}\rangle }&\mapsto {|{j}\rangle }{|{\ell (j)}\rangle }{|{d_{\ell (j)}}\rangle }{|{{\tilde{V}}(x_j)}\rangle }{|{\gamma (\xi )}\rangle }\\&\mapsto e^{-\textrm{i}\xi {\tilde{V}}(x_j)}{|{j}\rangle }{|{\ell (j)}\rangle }{|{d_{\ell (j)}}\rangle }{|{{\tilde{V}}(x_j)}\rangle }{|{\gamma (\xi )}\rangle }. \end{aligned}$$
Here, we take \(\xi =1\), and we can take, e.g., \(\xi =\Delta \tau \) for applications.
 
5.
Discard the unentangled ancillary qubits and do the uncomputation:
$$\begin{aligned} e^{-\textrm{i} {\tilde{V}}(x_j)}{|{j}\rangle }{|{\ell (j)}\rangle }{|{d_{\ell (j)}}\rangle }{|{{\tilde{V}}(x_j)}\rangle }{|{\gamma (\xi )}\rangle }&\mapsto e^{-\textrm{i} {\tilde{V}}(x_j)}{|{j}\rangle }{|{\ell (j)}\rangle }{|{d_{\ell (j)}}\rangle }{|{{\tilde{V}}(x_j)}\rangle } \\ &\mapsto e^{-\textrm{i} {\tilde{V}}(x_j)}{|{j}\rangle }{|{0}\rangle }{|{0}\rangle }{|{0}\rangle }. \end{aligned}$$
Here, the phase kickback register can be discarded, while one needs uncomputation for the result, data, and label/selection registers.
 
This method is based on arithmetic operations which are usually implemented by CNOT and Toffoli gates. On the other hand, it uses many ancillary qubits, e.g., hundreds of ancillary qubits for some elementary functions [26], which is unsuitable for early quantum computers. Although it is not the main concentration of this paper, we collect the possible implementations from previous works in “Appendix A.4,” and in the next section, we compare this method with the others in their performance on asymptotic gate count regarding grid parameter n and precision \(\delta \).

3 Gate count and comparison

We start by collecting the gate count/circuit depth for the methods in Sect. 2. The results are obtained under the explicit implementation described in “Appendix A.”
For WAL, considering the approximate implementation with a coarse-graining parameter \(m_0\), one needs \(2^{\min \{m_0,n\}}-1\) phase gates and \(2^{\min \{m_0,n\}}-2\) CNOT gates with depth \(2^{\min \{m_0,n\}}\). In this section, we omit the global phase and do not distinguish phase (rotation) gates from \(R_z\) gates due to their equivalence to a global phase. According to “Appendix A.1,” to guarantee the desired precision \(\delta >0\), we choose
$$\begin{aligned} m_0:= \Bigg \lceil \log _2 \left( L\frac{\Vert V^{(1)}\Vert _\infty }{\delta }\right) \Bigg \rceil , \end{aligned}$$
(2)
where \(\Vert V^{(k)}\Vert _\infty \) denotes the maximum norm of the k-th derivative of V.
For LIU, we use quantum Fourier transforms (QFTs) for the increment gate [34] and introduce a parameter \(m_1<n\). According to “Appendix A.2,” we implement the desired unitary diagonal matrix with \(4m_1(n-m_1)\) Hadamard gates, \(2(2^{m_1}+3m_1^2-1)(n-m_1)+2^{m_1}-1\) phase gates, and \(2(2^{m_1}+2m_1^2-2)(n-m_1)+2^{m_1}-2\) CNOT gates up to a global phase. Moreover, the depth under the gate set \(\{\textrm{H}, \textrm{CNOT}, R_z\}\) is \(2(2^{m_1}+16m_1-16)(n-m_1)+2^{m_1}\). On the other hand, since a controlled phase gate can be decomposed into 2 CNOT gates and 3 phase gates, mLIU needs \((3\cdot 2^{m_1}-3)(n-m_1)+2^{m_1}-1\) phase gates and \((3\cdot 2^{m_1}-4)(n-m_1)+2^{m_1}-2\) CNOT gates with depth \(2(2^{m_1+1}+16m_1)(n-m_1)+2^{m_1+1}\). For the desired precision \(\delta >0\), we take
$$\begin{aligned} m_1:=\Bigg \lceil \log _2 \left( L\sqrt{\frac{\Vert V^{(2)}\Vert _\infty }{8\delta }}\right) \Bigg \rceil . \end{aligned}$$
(3)
For PPP with a given polynomial degree p, we introduce a parameter \(m_p\). According to “Appendix A.3,” we need \((8m_1+4)({\tilde{M}}_1-1)\) Hadamard gates, \(n+(3n+12m_1^2+4m_1+3)({\tilde{M}}_1-1)\) phase gates, and \((8m_1^2+2n)({\tilde{M}}_1-1)\) CNOT gates with depth \(1+(4n+64m_1-32)({\tilde{M}}_1-1)\) for \(p=1\), and \((8m_2+4)({\tilde{M}}_2-1)\) Hadamard gates, \(n+3n(n-1)/2+(7n(n-1)/2+3n+12m_2^2+4m_2+3)({\tilde{M}}_2-1)\) phase gates, and \(n(n-1)+(4n(n-1)+2n+8m_2^2)({\tilde{M}}_2-1)\) CNOT gates with depth \(\mathcal {O}({\tilde{M}}_2(n^2+m_2))\) for \(p=2\). To guarantee the desired precision \(\delta >0\), we take
$$\begin{aligned} m_2 := \Bigg \lceil \log _2 \left( L\left( \frac{2\Vert V^{(3)}\Vert _\infty }{81\delta }\right) ^{\frac{1}{3}}\right) \Bigg \rceil , \quad \end{aligned}$$
and
$$\begin{aligned} {\tilde{M}}_2 \simeq {\hat{M}}_{2} := \Bigg \lceil \int _0^L \left( \frac{2|V^{(3)}(x)|}{81\delta }\right) ^{\frac{1}{3}} dx\Bigg \rceil , \quad {\tilde{M}}_1 \simeq {\hat{M}}_{1} := \Bigg \lceil \int _0^L \left( \frac{|V^{(2)}(x)|}{8\delta }\right) ^{\frac{1}{2}} dx\Bigg \rceil . \end{aligned}$$
Here, \({\hat{M}}_{p}\) is the approximation of \({\tilde{M}}_p\) up to a pre-factor (see “Appendices A.3, B.2”), and we choose the constants \(C_p\) in Eq. (A2) as \(C_1=1/8\) and \(C_2=2/81\), which correspond to the optimal constants for two-point Hermite interpolations [35]. By noting that j-controlled phase gate can be implemented by \(\mathcal {O}(j^2)\) CNOT gates and single-qubit gates without any ancillary qubit in linear depth \(\mathcal {O}(j)\) [36], we need \(\mathcal {O}((p^2n^p+m_p^2){\tilde{M}}_p)\) CNOT gates and single-qubit gates with depth \(\mathcal {O}((pn^p+m_p){\tilde{M}}_p)\) for general \(p\ge 3\). On the other hand, we have a comment on the special case that \(\Vert V^{(p+1)}\Vert _\infty =0\) for some \(p\in \mathbb {N}\), which means V itself is a polynomial of degree smaller than p. According to the error estimates for spline interpolations, \(m_p\) can be arbitrarily chosen (e.g., \(m_p=0\)), and we have \({\tilde{M}}_p=1\).
According to “Appendix A.4,” APK uses \(\mathcal {O}\left( (n+p)\delta ^{-1/(p+1)}+p(p+\log (1/\delta ))^2\right) \) Toffoli gates, \(\mathcal {O}\left( (n+p)\delta ^{-1/(p+1)}{+}p(p{+}\log (1{/}\delta ))^2\right) \) Clifford gates, and \(\mathcal {O}\left( (\log (1/\delta ))^2\right) \) phase gates with \(\mathcal {O}(1+p(p+\log (1/\delta )))\) initialized ancillary qubits.

3.1 Asymptotic gate count

Although we cannot directly compare the methods since their implementations are based on different gate sets, we can derive their asymptotic dependence on the parameters, which also provides information on the features of these methods. The asymptotic gate count and ancilla count concerning polynomial degree p, precision \(\delta \), and large grid parameter n (so that gate count of WAL is \(\mathcal {O}(\min \{2^n, \delta ^{-1}\}=\mathcal {O}(\delta ^{-1}))\)) are given in Table 1 using the above estimations of the gate count.
Table 1
Asymptotic gate count and ancilla count for different methods with respect to grid parameter n, polynomial degree p, and precision \(\delta \)
 
WAL
LIU/mLIU
PPP (\(p\ge 1\))
APK (\(p\ge 1\))
Asymptotic gate count
\(\mathcal {O}(\delta ^{-1})\)
\(\mathcal {O}(\delta ^{-1/2}n)\)
\(\mathcal {O}(\delta ^{-1/(p+1)}(n^p+(\log 1/\delta )^2))\)
\(\mathcal {O}((n+p)\delta ^{-1/(p+1)}+p(p+\log 1/\delta )^2)\)
Ancilla count
0
0/1
1
\(\mathcal {O}(1+p(p+\log 1/\delta ))\)
For large fixed n, it is clear that PPP and APK both provide the best order of \(\mathcal {{\widetilde{O}}}(\delta ^{-1/(p+1)}) = \mathcal {O}(\delta ^{-1/(p+1)}\textrm{poly}\log \delta ^{-1})\) for \(p\ge 2\). In particular, APK performs better than PPP in high-precision simulations at the cost of using \(\mathcal {O}(\log 1/\delta )\) ancillary qubits since the constant before \(\delta ^{-1/(p+1)}\) is independent of \(\delta \), and the dependence on n is linear in APK, while it is polynomial in PPP. This observation indicates the possible superiority of APK among the mentioned methods in the problem of high-precision simulations. However, as the near-term quantum devices do not afford such a large amount of logical qubits (hundreds of qubits due to [26]), we focus on WAL, LIU, and PPP, which employ at most one ancillary qubit and leave further discussion on APK to future work.
Table 2
Required resources for different methods with respect to grid parameter n and precision \(\delta \)
 
WAL
LIU/mLIU
PPP (\(p=1\))
PPP (\(p\ge 2\))
CNOT count
\(\mathcal {O}(\min \{2^n,\delta ^{-1}\})\)
\(\mathcal {O}(\delta ^{-1/2}n)\)
\(\mathcal {O}(\delta ^{-1/2}(n+(\log 1/\delta )^2))\)
\(\mathcal {O}(\delta ^{-1/(p+1)}(n^p+(\log 1/\delta )^2))\)
\(R_z\) count
\(\mathcal {O}(\min \{2^n,\delta ^{-1}\})\)
\(\mathcal {O}(\delta ^{-1/2}n)\)
\(\mathcal {O}(\delta ^{-1/2}(n+(\log 1/\delta )^2))\)
\(\mathcal {O}(\delta ^{-1/(p+1)}(n^p+(\log 1/\delta )^2))\)
Depth
\(\mathcal {O}(\min \{2^n,\delta ^{-1}\})\)
\(\mathcal {O}(\delta ^{-1/2}n)\)
\(\mathcal {O}(\delta ^{-1/2}(n+\log 1/\delta ))\)
\(\mathcal {O}(\delta ^{-1/(p+1)}(n^p+\log 1/\delta ))\)
Ancilla count
0
0/1
1
1
Next, we compare WAL, LIU, and PPP in detail and list the asymptotic gate count and depth as \(n\rightarrow \infty \) and \(\delta \rightarrow 0\) in Table 2. From Table 2, we find three facts. First, focusing on the first two rows, the \(R_z\) count has the same order as the CNOT count, which implies that it is sufficient to investigate the CNOT count when we discuss the asymptotic gate count. Second, focusing on the first and third rows for PPP, the depth has a slightly weaker dependence \(\delta ^{-1/(p+1)}\log 1/\delta \) than \(\delta ^{-1/(p+1)}(\log 1/\delta )^2\) for the CNOT count, but both the depth and the CNOT count have the order \(\mathcal {{\widetilde{O}}}(\delta ^{-1/(p+1)})\) if we do not care the less dominant factor. Third, focusing on the second and third columns, LIU/mLIU has less asymptotic gate count/depth than PPP with \(p=1\) concerning \(\delta \). If n is independent of \(\delta \), then PPP with large p provides less gate count for sufficiently small \(\delta \). However, it is not rare that the grid parameter n depends on the desired precision \(\delta \) considering the discretization error. In the following context, we tell how to choose the method regarding different \(n,\delta \) for given functions.

3.2 Case study for selected functions

The asymptotic gate count may not make sense because the grid parameter n and the desired precision \(\delta \) are chosen according to the practical applications. Here, we compare WAL, LIU, and PPP for two given functions with varying parameters n, \(\delta \).
Example 1
\(V(x) = A/\sqrt{a^2+(x-L/2)^2}\),    \(x\in [0,L]\).
The first case is the modified Coulomb potential peaked at the center of the interval with parameters A and a describing its amplitude and “singularity" near the peak, respectively. As an illustration, we choose \(A=1\), \(L=20\), and plot the CNOT count of the implementations by WAL, LIU, and PPP for a couple of precision \(\delta \) and parameter a. For PPP, we apply Algorithm 1 in “Appendix A.5” based on the analytic error of spline methods. PPPs with different polynomial degrees \(p=1,2,3\) are discussed, and we apply two-point Hermite interpolations [35] (the best ones with known optimal constants) for both \(p=2\) and \(p=3\) in the step of preparing piecewise polynomials by classical computation. The comparison results for different \(\delta \) and \(a^2=0.5, 0.1, 0.001\) are demonstrated in Fig. 4. We note that for \(n=m_0\), we have
$$\begin{aligned} \max _{j=0,1,\ldots ,N-1}|V(x_{j+1})-V(x_{j})|&= \max _{j=0,1,\ldots ,N-1}\left| \int _{x_j}^{x_{j+1}}V^{(1)}(x)dx\right| \\&\le L/2^n \Vert V^{(1)}\Vert _\infty = L/2^{m_0} \Vert V^{(1)}\Vert _\infty \le \delta . \end{aligned}$$
Fig. 4
(Color online) CNOT count vs. grid parameter n for modified Coulomb potential with several precision \(\delta \) and parameter \(a^2=0.5,0.1,0.001\) by different methods. The vertical dotted line denotes the parameter \(m_0\) defined by Eq. (2), which is the minimal division parameter m such that difference of V at any two adjacent knots of the \(2^{m}\) uniform division is smaller than \(\delta \). The choices of n are different for each subplot, and we choose n in \(\{m_1,m_1+1,\ldots , m_0+2\}\), where \(m_0,m_1\) are defined by Eqs. (2), (3). The potential functions with different \(a^2\) are given on the right side of each row for reference
Full size image
Then, for \(n>m_0\), we only need to apply the approximate circuit to the top \(m_0\) qubits to guarantee the desired precision \(\delta \). Hence, it is sufficient to consider the range of \(n\le m_0\). According to Fig. 4, we have the following observation:
(i)
We find WAL/LIU is the best one for small \(n\le m_1\) (LIU is reduced to WAL for \(n\le m_1\) by its construction). Moreover, for sufficiently large \(m_0\), LIU shows better performance than WAL after some \(m^*\le m_0\).
 
(ii)
Linear interpolation with LIU is usually better than that with PPP (\(p=1\)), but for the severe “singularity" (large derivatives) case, PPP with \(p=1\) can perform better than LIU.
 
(iii)
Although the asymptotic gate count with respect to \(\delta \) shows the superiority for large polynomial degree \(p\ge 2\), it is not correct for fixed \(\delta \) in practice due to the factor \(n^p\) that is exponentially increasing in p. Here, we find PPP with \(p=3\) does not outperform either PPP with \(p=2\) or LIU in any subplot. Moreover, PPP with \(p=2\) (more precisely, with quadratic Hermite spline) exhibits the best CNOT count for severe “singularity" and high-precision case when n is sufficiently large (see the subplots of row 3 and columns 2 and 3).
 
Example 2
\(V(x) = Ae^{-ax^2} \cos {\omega x} \),    \(x\in [0,L]\).
Fig. 5
(Color online) CNOT count vs. grid parameter n with the damped oscillating function with several precision \(\delta \) and parameter \(a=1,0.1,0.001\) by different methods. The vertical dotted line denotes the parameter \(m_0\) defined by Eq. (2), which is the minimal division parameter m such that difference of V at any two adjacent knots of the \(2^{m}\) uniform division is smaller than \(\delta \). The choices of n are different for each subplot, and we choose n in \(\{m_1,m_1+1,\ldots , m_0+2\}\), where \(m_0,m_1\) are defined by Eqs. (2), (3). The potential functions with different a are given on the right side of each row for reference
Full size image
The second case is a damped oscillating function with parameters A, a, and \(\omega \), where A is the scaling factor of amplitude, a describes the decreasing rate, and \(\omega \) denotes the frequency of the sinusoidal function. Similarly, we choose \(A=1\), \(L=20\), \(\omega =2\) and plot the CNOT count of the implementations by WAL, LIU, and PPP for a couple of precision \(\delta \) and parameter a. The comparison results for different \(\delta \) and \(a=1, 0.1, 0.001\) are demonstrated in Fig. 5. Again, we provide a plot of CNOT count with respect to \(n=m_1,m_1+1,\ldots , m_0+2\) for several choices of a and \(\delta \). The CNOT count in the case of \(n\ge m_0\) is regarded as constant for each subplot since it is enough to apply the approximate circuit to the top \(m_0\) qubits. Here, we use mLIU for the reason that V is not periodic. According to Fig. 5, we have a similar observation that regardless of the oscillation of the function, WAL/mLIU is suitable for the case of low precision and small derivatives, and PPP \((p=2)\) gives the least CNOT count for high-precision and large derivatives case.
We explain the interesting observation that PPP with \(p=2\) outperforms PPP with \(p=3\) in practical cases. For high-precision case, since we consider \(n=\{m_1,m_1+1,\ldots , m_0\}\), which implies \(n=\mathcal {O}(\log _2 \delta ^{-1})\). Then, the dominant gate count for PPP with \(p=2\) is \((\log _2 \delta ^{-1})^2\delta ^{-1/3}\), while it is \((\log _2 \delta ^{-1})^3\delta ^{-1/4}\) for PPP with \(p=3\). By noting that \(\delta ^{-1/12}\) is smaller than \(\log _2\delta ^{-1}\) for \(10^{-22}\le \delta \le 0.1\), we can verify that PPP with \(p=2\) outperforms PPP with a higher degree for most practical cases, e.g., \(\delta =10^{-3}\). More precisely, the smallest \(\delta \) such that PPP with \(p=2\) outperforms other PPPs depends on the given function, and we postpone a detailed analysis to “Appendix B.3.” To find the best method in the sense of minimizing CNOT count, we suggest the following steps. First, we calculate the parameters \(m_0,m_1,m_2,{\tilde{M}}_1\), and \({\tilde{M}}_2\) by the function V and precision \(\delta \). Next, we plot the CNOT count of WAL, LIU/mLIU (LIU if periodic and mLIU otherwise), and PPP (\(p=2\)), respectively, in terms of the analytic estimations (see “Appendix A” or a summary in “Appendix B.3”). Finally, we pick up the one with the least CNOT count according to the grid parameter n. Owing to our “sharp" analytic estimations of the parameters, the above calculations are cheap on a classical computer, and hence, it is possible to choose a suitable method prior to the construction of the quantum circuits.
In Figs. 4 and 5, we provide a detailed comparison only for the CNOT count. Since the implementations of WAL, LIU, and PPP are based on CNOT gates and phase gates, roughly speaking, the circuit depth is nearly proportional to the CNOT count. Then, we have a similar discussion for the depth (see “Appendix B.4” for more details).

4 Application to first-quantized Hamiltonian simulation

In this section, we apply the methods to Hamiltonian simulation and estimate the required resources regarding several parameters. We mainly focus on CNOT count since it is an important factor in describing the entanglement of the qubits, and it provides us with information on gate count and circuit depth according to our previous discussion.
As for the problem of first-quantized Hamiltonian simulation, we employ the well-known grid-based method using the so-called centered/shifted quantum Fourier transform \(U_{\text {CQFT}}\) to diagonalize the kinetic energy operator \(\hat{T}\). By noting that the potential energy operator \(\hat{V}\) itself is diagonal in the real-space representation, we combine this with the Trotter-Suzuki formula to obtain an approximation scheme [5, 23, 37, 38]. For instance, with the first-order Trotter-Suzuki formula, we have the following approximation
$$\begin{aligned} e^{-\textrm{i}\mathcal {H}K\Delta t} = (e^{-\textrm{i}\mathcal {H}\Delta t})^K \approx (e^{-\textrm{i}\hat{T}\Delta t} e^{-\textrm{i}\hat{V}\Delta t})^K = (U_{\text {CQFT}} U_{\text {kin}}(\Delta t)U_{\text {CQFT}}^{\dag } U_{\text {pot}}(\Delta t))^K, \end{aligned}$$
where \(U_{\text {kin}}(\Delta t):=U_{\text {CQFT}}^{\dag }e^{-\textrm{i}\hat{T}\Delta t} U_{\text {CQFT}}\), \(U_{\text {pot}}(\Delta t):=e^{-\textrm{i}\hat{V}\Delta t}\) are two diagonal unitary matrices with a time step parameter \(\Delta t\).

4.1 General formulation of a molecular system

We adopt the formulation introduced in [23] and consider a molecular system consisting of \(N_{\text {e}}\) electrons as quantum mechanical particles and \(N_{\text {nuc}}\) nuclei as classical point charges fixed at some known positions \(\textbf{R}_v\in \mathbb {R}^d\) (\(v=0,\ldots , N_{\text {nuc}}-1\)) where \(d\in \mathbb {N}\) is the spatial dimension. We assume these two kinds of particles interact with each other under pairwise interactions, which depend only on the distance between two particles. In other words, we consider the Hamiltonian given by
$$\begin{aligned} \mathcal {H}(\{\varvec{R}_v\}_v)&= {\hat{T}} + {\hat{V}}(\{\varvec{R}_v\}_v) = {\hat{T}} + {\hat{V}}_{\text {ee}} + {\hat{V}}_{\text {en}}(\{\varvec{R}_v\}_v) + E_{\text {nn}}(\{\varvec{R}_v\}_v) + {\hat{V}}_{\text {ext}} \\ &= \sum _{\ell =0}^{N_{\text {e}}-1} \frac{\varvec{{\hat{p}}}_{\ell }^2}{2m_{\text {e}}} + \frac{1}{2} \sum _{\ell ,\ell ^\prime =0, \ell \not =\ell ^\prime }^{N_{\text {e}}-1} v_{\text {ee}}(|\varvec{{\hat{r}}}_\ell -\varvec{{\hat{r}}}_{\ell ^\prime }|) + \sum _{\ell =0}^{N_{\text {e}}-1}\sum _{v=0}^{N_{\text {nuc}}-1} -Z_v v_{\text {en}}(|\varvec{{\hat{r}}}_\ell -\varvec{R}_v|) \\ &\quad + \frac{1}{2} \sum _{v,v^\prime =0, v\not =v^\prime }^{N_{\text {nuc}}-1} Z_{v}Z_{v^\prime } v_{\text {nn}}(|\varvec{R}_v-\varvec{R}_{v^\prime }|) + \sum _{\ell =0}^{N_{\text {e}}-1} v_{\text {ext}}(\varvec{{\hat{r}}}_\ell ). \end{aligned}$$
Here, \(\varvec{{\hat{r}}}_\ell \) and \(\varvec{{\hat{p}}}_\ell \) denote the position and momentum operators of the \(\ell \)-th electron, respectively, and \(m_{\text {e}}\) is the mass set to 1. Moreover, \(Z_v\) is the charge of the v-th nucleus, while we have already set the charge of the electrons to \(-1\). Although it is possible to consider the nuclei in quantum registers, in this paper, we consider only the electrons in quantum registers to minimize the number of required qubits. Furthermore, we can combine the parts \({\hat{V}}_{\text {en}}\), \(E_{\text {nn}}\), and \({\hat{V}}_{\text {ext}}\) as a potential
$$\begin{aligned} {\hat{V}}_{\text {e}}(\{\varvec{R}_v\}_v) := \sum _{\ell =0}^{N_{\text {e}}-1} v_{\ell }(\varvec{{\hat{r}}}_\ell ;\{\varvec{R}_v\}_v), \end{aligned}$$
where \(v_{\ell }\) is the potential felt by the \(\ell \)-th electron except for the electron–electron interactions.
We encode the \(N_{\text {e}}\)-electron wave function in real space using n qubits for each dimension per electron, as usual in the first-quantized/grid-based formalism. We collect \(d n N_{\text {e}}\) qubits as the electronic register: \({|{\varvec{k}_0}\rangle }_{dn} \otimes \cdots \otimes {|{\varvec{k}_{N_{\text {e}}-1}}\rangle }_{dn}\) where each \(\varvec{k}_\ell \) describes d integers of n-bit corresponding to the position eigenvalue \(\sum _{i=1}^d k_{\ell ,i}\varvec{e}_i \Delta x_i\). Here, \(\varvec{e}_i\) is the unit vector, \(\Delta x_i = L_i/2^n\), \(i=1,\ldots ,d\), and \(L_i\) is the length of the simulation cell in each dimension.
Fig. 6
Implementation of the real-time evolution operator \(U_{\text {RTE}}(\Delta t) = e^{-\textrm{i}\mathcal {H}\Delta t}\) by the Trotter-Suzuki formula. The left circuit is for the first-order Trotter-Suzuki formula, while the right one is for the second-order Trotter-Suzuki formula. For each dimension, we apply \(U_{\text {CQFT}}\) and \(U_{\text {kin}}(\Delta t)\) on the n-qubit registers, which means there are \(2d N_{\text {e}}\) operations \(U_{\text {CQFT}}\)/\(U_{\text {CQFT}}^\dag \) and \(d N_{\text {e}}\) operations \(U_{\text {kin}}(\Delta t)\) in total in the above circuits
Full size image
Fig. 7
Implementations of the electron–nucleus interaction \(e^{-\textrm{i}Z_v v_{\text {en}}(|\varvec{{\hat{r}}}_\ell -\varvec{R}_v|)\Delta t}\) for \(\ell =0,\ldots ,N_{\text {e}}-1\), \(v=0,\ldots ,N_{\text {nuc}}-1\) and the electron–electron interaction \(e^{-\textrm{i}v_{\text {en}}(|\varvec{{\hat{r}}}_\ell -\varvec{{\hat{r}}}_{\ell ^\prime }|)\Delta t}\) for \(\ell ,\ell ^\prime =0,\ldots ,N_{\text {e}}-1\), \(\ell \not =\ell ^\prime \). Here, \(n_{\text {anc}}\) is the number of ancillary qubits, and \(n_{\text {dis}}\) denotes the size of the distance register. Detailed circuits of \(U_{\text {dis},v}(\varvec{R}_v)\) and \(U_{\text {dis}}\) will be given in “Appendix C.2
Full size image
Under this formulation, the real-time evolution operator with time step \(\Delta t\) by the first-/second-order Trotter-Suzuki formula is shown in Fig. 6. Although there are \(N_{\text {e}}(N_{\text {e}}-1)\) electron–electron interactions and \(N_{\text {e}}\) single electron potentials in \(U_{\text {pot}}(\Delta t)\), we find that \(U_{\text {pot}}(\Delta t)\) can be implemented in linear depth of \(\mathcal {O}(N_{\text {e}})\) by parallelizing the implementations even without ancillary registers (see “Appendix C.1”). In general, implementation of the single electron potential \(e^{-\textrm{i}v_\ell (\varvec{{\hat{r}}}_\ell ; \{\varvec{R}_v\}_v)\Delta t}\) needs a phase gate for a d-variable function. If \({\hat{V}}_{\text {ext}}\equiv 0\), then the electron–electron/electron–nucleus interaction can be implemented using a distance register as Fig. 7. Here, we put the part of \(E_{\text {nn}}\) for the v-th nucleus into \(U_{\text {en},v}(\Delta t)\), and \(U_{\text {en},v}(\Delta t)\) and \(U_{\text {ee}}(\Delta t)\) are both diagonal unitary operations that the methods in Sect. 2 work. As for \(U_{\text {dis},v}(\varvec{R}_v)\) and \(U_{\text {dis}}\), there are usual ways based on arithmetic operations. Besides, we also propose implementations based on QFTs and polynomial phase gates to minimize the number of required qubits. The details are provided in “Appendix C.2” for completeness.

4.2 Resource estimation

We estimate the required resources for the implementation using the Trotter-Suzuki formula. For a given error bound \(\varepsilon >0\) and simulation time t, we are interested in finding an approximate circuit for \(e^{-\textrm{i}\mathcal {H}t}\) based on the second-order Trotter-Suzuki formula such that
$$\begin{aligned} \left\| e^{-\textrm{i}\mathcal {H}t}-({\tilde{U}}_{\text {pot}}(t/(2K)) U_{\text {CQFT}}^{\otimes d} U_{\text {kin}}^{\otimes d}(t/K)U_{\text {CQFT}}^{\dag \otimes d} {\tilde{U}}_{\text {pot}}(t/(2K)))^K\right\| \le \varepsilon . \end{aligned}$$
In other words, we find \(K\in \mathbb {N}\) and some approximate unitary diagonal operation \({\tilde{U}}_{\text {pot}}\) to \(U_{\text {pot}}\), which depends on the error bound \(\varepsilon \).
As we know, there are three types of errors in the theoretical formulation of the first-quantized Hamiltonian simulation based on the Trotter-Suzuki formula.
Discretization error Roughly speaking, this is an error between the discrete wave function and the continuous wave function subject to the Schrödinger equation, which decreases exponentially as the grid parameter n increases [37]. Theoretically, one can regard this error as the truncation error of the Fourier series expansion, but this is beyond the interest of this paper. As a result, we assume that the grid parameter is suitably determined, and we consider only the error between the operation \(e^{-\textrm{i}H_n t}\) and its approximation where \(H_n\) is the \(2^n \times 2^n\)-discretized Hamiltonian matrix of \(\mathcal {H}\). That is, we consider
$$\begin{aligned} \left\| e^{-\textrm{i}H_n t}-({\tilde{U}}_{\text {pot}}(t/(2K)) U_{\text {CQFT}}^{\otimes d} U_{\text {kin}}^{\otimes d}(t/K)U_{\text {CQFT}}^{\dag \otimes d} {\tilde{U}}_{\text {pot}}(t/(2K)))^K\right\| \le \varepsilon . \end{aligned}$$
(4)
Trotter-Suzuki error According to the error estimate for the second-order Trotter-Suzuki formula [39], we have
$$\begin{aligned}&\left\| e^{-\textrm{i}H_n t}-(U_{\text {pot}}(t/(2K)) U_{\text {CQFT}}^{\otimes d} U_{\text {kin}}^{\otimes d}(t/K)U_{\text {CQFT}}^{\dag \otimes d} U_{\text {pot}}(t/(2K)))^K\right\| \\&\quad =\left\| e^{-\textrm{i}(T_n+V_n) t}-\left( e^{-\textrm{i}V_n t/(2K)} e^{-\textrm{i}T_n t/(2K)} e^{-\textrm{i}V_n t/(2K)}\right) ^K\right\| \le CK(t/K)^3, \end{aligned}$$
where C is some constant depending on the discretized matrices \(T_n\) and \(V_n\). If we assume the right-hand side above is upper bound by \(\varepsilon /2\), then we reach an estimation of K as
$$\begin{aligned} K = \mathcal {O}(t^{3/2}\varepsilon ^{-1/2}). \end{aligned}$$
Approximation error By noting the unitarity of the operations \(e^{-\textrm{i}T_n t/K}\) and \(e^{-\textrm{i}V_n t/(2K)}\), we estimate
$$\begin{aligned}&\Big \Vert ({\tilde{U}}_{\text {pot}}(t/(2K)) U_{\text {CQFT}}^{\otimes d} U_{\text {kin}}^{\otimes d}(t/K)U_{\text {CQFT}}^{\dag \otimes d} {\tilde{U}}_{\text {pot}}(t/(2K)))^K \\&\quad -(U_{\text {pot}}(t/(2K)) U_{\text {CQFT}}^{\otimes d} U_{\text {kin}}^{\otimes d}(t/K)U_{\text {CQFT}}^{\dag \otimes d} U_{\text {pot}}(t/(2K)))^K\Big \Vert \\&\qquad \le 2K\left\| {\tilde{U}}_{\text {pot}}(t/(2K))-U_{\text {pot}}(t/(2K))\right\| \le 2K t/(2K) \Vert {\tilde{V}}_n - V_n\Vert _\infty = t \Vert {\tilde{V}}_n - V_n\Vert _\infty . \end{aligned}$$
According to Fig. 7, we apply the approximate circuit with precision \(\delta \) for each implementation of the electron–nucleus/electron–electron interaction part. Taking the additive error into account, we have
$$\begin{aligned} t \Vert {\tilde{V}}_n - V_n\Vert _\infty \le t \delta (N_{\text {e}} N_{\text {nuc}}+N_{\text {e}}(N_{\text {e}}-1)/2), \end{aligned}$$
which implies \(\delta = \varepsilon /(t(2N_{\text {e}} N_{\text {nuc}}+N_{\text {e}}(N_{\text {e}}-1)))\) so that \(t\Vert {\tilde{V}}_n - V_n\Vert _\infty \le \varepsilon /2\), and hence, Eq. (4) is satisfied.
The distance register needs \(2n+\lceil \log _2 d\rceil \) qubits to describe a value in \([0, dL^2)\), which yields the asymptotic gate count of PPP (see “Appendix A.3” with L replaced by \(dL^2\)) for each approximation:
$$\begin{aligned}&\mathcal {O}\left( dt^{1/(p+1)}N_{\text {tot}}^{2/(p+1)}\varepsilon ^{-1/(p+1)}(n^p(\log d)^p+(\log (dtN_{\text {tot}}^2/\varepsilon )^2))\right) \\&\quad = \mathcal {{\widetilde{O}}}\left( dn^p t^{1/(p+1)}N_{\text {tot}}^{2/(p+1)}\varepsilon ^{-1/(p+1)}\right) , \end{aligned}$$
concerning the parameters \(n, d, t, N_{\text {tot}},\varepsilon \) where \(N_{\text {tot}}=N_{\text {e}}+N_{\text {nuc}}\) is the total number of particles. The hidden linear dependence on d comes from the number of sub-intervals \({\tilde{M}}\) because \({\tilde{M}}\) is proportional to \(dL^2\) in the worst case. In fact, for a “localized" potential such that the integral regarding its derivative is independent of d, \({\tilde{M}}\) is uniform with respect to d, and the above linear dependency will vanish. Moreover, the circuit depth is in the same order as the gate count due to Table 2. As for the second-order Trotter-Suzuki formula, we need \(K+1\) times implementations of \({\tilde{U}}_{\text {pot}}\) and \(KdN_{\text {e}}\) times implementations of \(U_{\text {kin}}\) (an operation on n qubits). Therefore, the total gate count of the Hamiltonian simulation is \(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^{2+2/(p+1)}t^{3/2+1/(p+1)}\varepsilon ^{-1/2-1/(p+1)}\right) \), and its circuit depth is \(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^{1+2/(p+1)}t^{3/2+1/(p+1)}\varepsilon ^{-1/2-1/(p+1)}\right) \). Here, the dependence on \(N_{\text {tot}}\) for the depth is one order smaller than that for the gate count because we use parallel implementations of the potential part \({\tilde{U}}_{\text {pot}}(t/(2K))\) we mentioned in the last paragraph of Sect. 4.1, and this saves the depth by a factor of \(N_{\text {e}}\). However, we need \((2n+1+\lceil \log _2 d\rceil )\) ancillary qubits for each electron in this case. Based on PPP and QFT-based/arithmetic-operation-based implementations of the distance registers, we summarize the required resources for the Hamiltonian simulation (\(d\ge 2\)) in Table 3. We have omitted the polynomial dependence \(\mathcal {O}(n^p)\) in gate count/circuit depth because we apply the approximate circuit only to the top \(m_0\) qubits of the circuit if \(n>m_0\). According to the definition of \(m_0\) in Eq. (2), this yields a negligible factor in gate count/circuit depth of \(\mathcal {O}((\log \varepsilon ^{-1})^{p})\).
Table 3
Resource estimation for Hamiltonian simulation with second-order Trotter-Suzuki formula
 
Gate count
Depth
Ancilla count
Qubit count
QFT (sequential)
\(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^{2+2/(p+1)}t^{3/2+1/(p+1)}\varepsilon ^{-1/2-1/(p+1)}\right) \)
\(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^{2+2/(p+1)}t^{3/2+1/(p+1)}\varepsilon ^{-1/2-1/(p+1)}\right) \)
\(2n+1+\lceil \log _2 d\rceil \)
\((dN_{\text {e}}+2)n+1+\lceil \log _2 d\rceil \)
Arithm. (sequential)
\(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^{2+2/(p+1)}t^{3/2+1/(p+1)}\varepsilon ^{-1/2-1/(p+1)}\right) \)
\(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^{2+2/(p+1)}t^{3/2+1/(p+1)}\varepsilon ^{-1/2-1/(p+1)}\right) \)
\(2dn+d(d+1)/2\)
\((N_{\text {e}}+2)dn+d(d+1)/2\)
QFT (parallel)
\(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^{2+2/(p+1)}t^{3/2+1/(p+1)}\varepsilon ^{-1/2-1/(p+1)}\right) \)
\(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^{1+2/(p+1)}t^{3/2+1/(p+1)}\varepsilon ^{-1/2-1/(p+1)}\right) \)
\(N_{\text {e}}(2n+1+\lceil \log _2 d\rceil )\)
\(N_{\text {e}}((d+2)n+1+\lceil \log _2 d\rceil )\)
Arithm. (parallel)
\(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^{2+2/(p+1)}t^{3/2+1/(p+1)}\varepsilon ^{-1/2-1/(p+1)}\right) \)
\(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^{1+2/(p+1)}t^{3/2+1/(p+1)}\varepsilon ^{-1/2-1/(p+1)}\right) \)
\(N_{\text {e}}(2dn+d(d+1)/2)\)
\(N_{\text {e}}(3dn+d(d+1)/2)\)
To clarify, we compare and emphasize the difference between our estimation and the result in [37, Theorem 6]. Childs et al. [37] discussed the gate complexity considering the discretization and the Trotter-Suzuki errors by neglecting the approximation error in implementing the potential part. In comparison, as we mentioned, our main target in this paper is how the approximation error contributes to the required resources. Thus, we focus on the Trotter-Suzuki and the approximation errors in our estimation by assuming that n is suitably determined. Using the second-order Trotter-Suzuki formula, [37, Theorem 6] gives the asymptotic gate count \(\mathcal {{\widetilde{O}}}\left( (dN_{\text {tot}})^{5/2}t^{3/2}\varepsilon ^{-1/2}\right) \) in our notation, where the worse \((dN_{\text {tot}})\)-dependence owes to the discretization error. Then, our result demonstrates an additional dependence of \(\mathcal {{\widetilde{O}}}\left( (t/\varepsilon )^{1/(p+1)}\right) \) in gate count/circuit depth if the approximation error is addressed (using piecewise p-th degree polynomial approximation). As we discussed in Sect. 3, p usually takes a small integer, e.g., \(p=2\) in practical cases. Then, our result shows a comparable polynomial dependence in gate count/depth coming from the approximation error as that coming from the Trotter-Suzuki error, which is a novel finding that one has to consider the approximation error in addition to the Trotter-Suzuki error.
We end this subsection with a crucial remark on the choice of grid parameter n. Although one can theoretically estimate the discretization error [37], it is not helpful in practice since the result gives only the order, and it seems impossible to derive a sharp number of n even for a detailed Hamiltonian. Thus, the grid parameter n will probably be chosen by testing or by experience in practice. Since our resource estimation is based on PPP, it is effective when we assume small error \(\varepsilon \) and large n. In contrast, if \(\varepsilon \) and n lie in the region where the exact implementation (WAL with \(m_0=n\)) is selected, then we have no approximation error and the total gate count of the Hamiltonian simulation is \(\mathcal {{\widetilde{O}}}\left( d N_{\text {tot}}^2 t^{3/2}\varepsilon ^{-1/2}\right) \).

5 Conclusion

In this paper, we focus on a systematic study of the approximate implementation of the real-time evolution operator for potentials with at most one ancillary qubit for a given error bound, which is an important task for first-quantized Hamiltonian simulation on a near-term quantum computer.
First, we summarize the previous methods WAL (sequency-ordered WALsh operators) and APK (Arithmetic operations and Phase Kickback) and propose an original method LIU (Linearly Interpolated Unitary diagonal matrix) and a modified method PPP (Phase gate for Piecewise-defined Polynomial). Using the explicit implementation of each method, we derive the asymptotic gate count and circuit depth regarding grid parameter n and precision \(\delta \) and find that PPP with a high-degree polynomial outperforms WAL/LIU in the order of precision. Moreover, APK has an even better performance than PPP in the scaling of the pre-factor at the cost of using \(\mathcal {O}(\log 1/\delta )\) (\(=\mathcal {O}(n)\) if \(\delta \propto 1/2^n\)) ancillary qubits.
Second, we conducted a case study with WAL, LIU, and PPP for the modified Coulomb potential and a damped oscillating function. By employing the theoretical optimal constants in the error estimation for spline interpolation, we divided the interval of n into three sub-intervals \(I_1=[1,m_1)\), \(I_2=[m_1,m_0]\), and \(I_3=(m_0,\infty )\). Here, \(m_0\) and \(m_1\) are integers calculated from \(\delta \), V, and the optimal constants. In a practical setting, our results show that for \(n\in I_1\), LIU is equivalent to WAL and WAL/LIU is better than PPP. Moreover, for \(n\in I_3\), it is sufficient to apply the approximate circuit to the top \(m_0\) qubits. Hence, we only need to compare the gate count/depth for different methods with \(n=m_0\). Furthermore, for \(n\in I_2\), we determine whether PPP outperforms WAL and LIU depending on \(\delta \), n, and the norms of the derivatives of the potential. If the norms are sufficiently large and \(\delta \) is sufficiently small, then there is some integer \(n^*\in I_2\) such that PPP is better than WAL and LIU for \(n\ge n^*\). Among the PPPs with different polynomial degrees, it appears that \(p=2\) (quadratic Hermite interpolation) works best in most practical settings of error bound \(\delta \).
Third, we adopted the formulation for a molecular system in [23], and by applying the grid-based method, we estimated the gate count and circuit depth for the first-quantized Hamiltonian simulation. Because we are keeping in mind the high-precision case, resources estimation based on PPP is addressed. In addition to the gate count of \(\mathcal {{\widetilde{O}}}\left( t^{3/2}\varepsilon ^{-1/2}\right) \) concerning the second-order Trotter-Suzuki error, we show an extra overhead of \(\mathcal {{\widetilde{O}}}\left( t^{1/(p+1)}\varepsilon ^{-1/(p+1)}\right) \) resulting from the approximation error, where t is the simulation time, \(\varepsilon \) is the error bound, and p is the polynomial degree. This implies that the approximation error is not negligible for long-time simulations under the requirement of a small error bound.
Finally, we provided comprehensive references and novel discussions on the technical details of the implementations, as well as their gate counts, in appendix. In addition, we proposed two closely related algorithms for finding the coefficients of polynomials on a classical computer, which helped us develop a complete numerical scheme for deriving an approximate quantum circuit of the real-time evolution operator from the input potential V, grid parameter n, polynomial degree p, and precision \(\delta \).
As an extension of this study, it may be interesting to consider a case in which the potential is given by a general multivariable function. For the one-variable (distance) potential discussed in this paper, distance registers must be implemented, as shown in Fig. 7. In particular, this yields an additional cost and introduces a linear dependence on \(N_{\text {nuc}}\) in the gate count and circuit depth because the distance registers are derived one by one. In a general setting of multivariable potential, one can implement the electron–nucleus interaction part and external potential part simultaneously, which seems to be more efficient.
In this paper, we focus on methods with minimal circuit width (number of qubits). As we found in the first part of Sect. 3.1, APK is a promising method for a high-precision case in the sense that its asymptotic gate count and circuit depth would be even smaller at the cost of using many ancillary qubits. Therefore, there seems to be a trade-off between circuit depth and width. A detailed discussion in this direction is indispensable for practical applications because of the limitation on the maximal depth/width of circuits that can be processed on near-term quantum computers.

Declarations

Conflict of interest

The authors declare no conflict of interest that could have appeared to influence this work.

Code availability

Please contact the corresponding author for the code availability.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://​creativecommons.​org/​licenses/​by/​4.​0/​.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Title
Approximate real-time evolution operator for potential with one ancillary qubit and application to first-quantized Hamiltonian simulation
Authors
Xinchi Huang
Taichi Kosugi
Hirofumi Nishi
Yu-ichiro Matsushita
Publication date
01-03-2025
Publisher
Springer US
Published in
Quantum Information Processing / Issue 3/2025
Print ISSN: 1570-0755
Electronic ISSN: 1573-1332
DOI
https://doi.org/10.1007/s11128-025-04697-7

Details on the methods

Sequency-ordered Walsh operators

In Zhang et al. [17], the authors optimized the previous circuit [1214] in depth by parallelizing quantum gates according to a constructive algorithm. For completeness, we give the circuit for \(n=4\) as an example in Fig. 8. For a \(2^n\times 2^n\) unitary diagonal matrix, such an implementation uses a global phase gate, \(2^n-1\) \(R_z\) gates, and \(2^n-2\) CNOT gates with depth \(2^n\). On the other hand, Welch et al. suggested an approximate circuit by applying the Walsh operators only to the top \(m<n\) qubits [14]. Since this is equivalent to conducting a piecewise constant approximation in a \(2^m\) uniform grid, we can estimate m for a given precision \(\delta \) as
$$\begin{aligned} \max _{x\in [0,L]}|V^{(1)}(x)|\frac{L}{2^m} \le \delta , \end{aligned}$$
Fig. 8
Example of quantum circuit by sequency-ordered Walsh operators with \(n=4\) [17]. Here, \(\varphi _j\) is the rotation angle derived by applying the Hadamard matrix to the diagonal vector of the discretized potential matrix
Full size image
which implies that the difference of V at arbitrary two adjacent grid points is upper bound by \(\delta \). Thus, we can take
$$\begin{aligned} m = \Bigg \lceil \log _2 \left( L\frac{\Vert V^{(1)}\Vert _\infty }{\delta }\right) \Bigg \rceil . \end{aligned}$$
Here and henceforth, \(V^{(k)}\) is the k-th derivative of V, and \(\Vert \cdot \Vert _\infty \) denotes the maximum norm. In this sense, the approximate circuit by [14] uses a global phase gate, \(2^m-1\) \(R_z\) gates, and \(2^m-2\) CNOT gates. Moreover, the depth based on the gate set \(\{\textrm{CNOT}, R_z\}\) is \(2^m\). [14] also mentioned the possibility of the reduction by truncating the Walsh operators with small coefficients, but this would introduce an additional truncation error that is hard to estimate analytically.
In order to avoid misleading, we mention that the above choice of parameter m, as well as those in the following subsections, is sufficient to guarantee the given precision. Although it is possible to choose even smaller m if \(\delta \) is relatively large, we find that the above choice of m becomes optimal, that is, the precision cannot be achieved for any integer strictly smaller than m, as \(\delta \) tends to 0. We refer to the last paragraph in “Appendix B.1” for details.

Linearly interpolated unitary diagonal matrix

According to Fig. 2, LIU uses \(2(n-m)+1\) queries to the \(\mathbb {C}^{M\times M}\) unitary diagonal operation and \(2(n-m)\) controlled increment/decrement gates on m qubits. The former can be realized by \(2^m-1\) \(R_z\) gates and \(2^m-2\) CNOT gates with depth \(2^m\) for each query [17], whereas the latter can be directly implemented by a couple of multi-controlled NOT gates using Fig. 9. Then, LIU needs a global phase gate, \((2^m-1)(2(n-m)+1)\) \(R_z\) gates, \((2^m-2)(2(n-m)+1)+2(n-m)\) CNOT gates, and \(2(n-m)\) j-controlled NOT gates for each \(j=2,3,\ldots ,m\). Using several ancillary qubits, j-controlled NOT gates can be realized by CNOT gates and single-qubit gates of linear order \(\mathcal {O}(j)\) [41, 42]. However, since we try to avoid using ancillas in this paper, we mention another way to implement \(U_{\text {+1}}\) based on a quantum Fourier transform (QFT), its inverse, and phase gates, which one can find in [34, Fig. 2].
Fig. 9
Quantum circuit of increment gate \(U_{\text {+1}}\) and its controlled version according to Exercise 4.27 in [40]
Full size image
If one applies the conventional way for the QFT and its inverse, then one needs a global phase gate, 2m Hadamard gates, m phase gates, and \(m(m-1)\) controlled phase gates for a single increment gate. Moreover, a controlled increment gate can be constructed by a global phase gate, 2m Hadamard gates, and \(m^2\) controlled phase gates since we only need to apply the control to the part of the phase gates. Under this implementation, LIU alternatively uses a global phase gate, \(4m(n-m)\) Hadamard gates, \((2^m-1)(2(n-m)+1)\) phase gates, \((2^m-2)(2(n-m)+1)\) CNOT gates, and \(2m^2(n-m)\) controlled phase gates. Based on the gate set \(\{\textrm{H}, \textrm{CNOT}, R_z\}\), the circuit depth is upper bound by \(2^m(2(n-m)+1)+2(n-m)(16m-16)\) because the depth of a QFT circuit or its inverse is \(8m-10\), and the controlled phase gates between the QFT and its inverse can be partly implemented with the QFT in parallel, which introduces only an additional depth of 4.
Next, we determine the parameter m by a given precision \(\delta >0\). In terms of a classical error estimate for linear interpolation, e.g., [35, 43], we have
$$\begin{aligned} \max _{j=0,1,\ldots ,N-1}|v_j-{\tilde{v}}_j| \le \frac{1}{8}\max _{x\in [0,L]}|V^{(2)}(x)| \left( \frac{L}{M}\right) ^2 \le \delta , \end{aligned}$$
where \(V\in C^2[0,L]\) is a sufficiently smooth function. Here, \({\tilde{v}}_j\) takes the value of the linear interpolation function \({\tilde{V}}\) at each uniform knot \(x_j = jL/N\), \(j=0,1,\ldots ,N\). Therefore, we have the estimation of m by
$$\begin{aligned} m = \Bigg \lceil \log _2 \left( L\sqrt{\frac{\Vert V^{(2)}\Vert _\infty }{8\delta }}\right) \Bigg \rceil . \end{aligned}$$
(A1)
If the above integer is larger than n, then we simply choose \(m=n\). In other words, we do not execute linear interpolation, and LIU is equivalent to WAL.

Phase gate for piecewise-defined polynomial

Similarly to [23], we provide an efficient circuit based on multi-controlled phase gates for a single polynomial phase gate:
$$\begin{aligned} U_{\text {ph}}[f] = \sum _{j=0}^{N-1} \textrm{exp}\left( -\textrm{i}f(x_j)\right) {|{j}\rangle }{\langle {j}|}. \end{aligned}$$
Here, we assume that f is a p-th degree real-valued polynomial function satisfying
$$\begin{aligned} f(x) = \sum _{k=0}^{p} a_k x^k, \quad x\in [0,L], \end{aligned}$$
for \(a_k\in \mathbb {R}\), \(k=0,1,\ldots ,p\). Recall that \(x_j = jL/N\), and \(j\in \{0,1,\ldots ,N-1\}\) admits the binary representation \([j_{n-1}j_{n-2}\cdots j_0]\). Then, we calculate
$$\begin{aligned} a_k x_j^k&= a_k j^k (L/N)^k \\&= a_k (L/N)^k \left( \sum _{\ell =0}^{n-1}j_\ell 2^{\ell }\right) ^k \\&= \sum _{\ell _0=0}^{n-1}\cdots \sum _{\ell _{k-1}=0}^{n-1} a_k (L/N)^k 2^{\ell _0+\cdots +\ell _{k-1}} \delta _{j_{\ell _0},1} \cdots \delta _{j_{\ell _{k-1}},1}. \end{aligned}$$
Thus, we have
$$\begin{aligned} U_{\text {ph}}[f] {|{j}\rangle }&= \textrm{exp}\left( -\textrm{i}f(x_j)\right) {|{j}\rangle } \\ &= \prod _{k=0}^p \textrm{exp}\left( -\textrm{i}a_k x_j^k\right) {|{j}\rangle } \\ &= \prod _{k=0}^p \left( \prod _{\ell _0=0}^{n-1}\cdots \prod _{\ell _{k-1}=0}^{n-1} \textrm{exp}\left( -\textrm{i}a_k (L/N)^k 2^{\ell _0+\cdots +\ell _{k-1}} \delta _{j_{\ell _0},1} \cdots \delta _{j_{\ell _{k-1}},1}\right) \right) {|{j}\rangle } \\ &= \prod _{k=0}^p \left( \prod _{\ell _0=0}^{n-1}\cdots \prod _{\ell _{k-1}=0}^{n-1} \textrm{C}Z_{\ell _0,\ldots ,\ell _{k-1}}\left( -a_k (L/N)^k 2^{\ell _0+\cdots +\ell _{k-1}}\right) \right) {|{j}\rangle }, \end{aligned}$$
where \(\textrm{C}Z_{\ell _0,\ldots ,\ell _{k-1}}\left( \theta \right) \) denotes a (multi-)controlled phase gate, which yields a phase factor \(e^{\textrm{i}\theta }\) if and only if the \(\ell _0\)-th, ..., \(\ell _{k-1}\)-th qubits are all in state \({|{1}\rangle }\) (repeated indices are allowed). Equivalently, it conducts \(Z(\theta ) = {|{0}\rangle }{\langle {0}|}+e^{\textrm{i}\theta }{|{1}\rangle }{\langle {1}|}\) to an arbitrary qubit among \({|{q_{\ell _0}}\rangle },\ldots ,{|{q_{\ell _{k-1}}}\rangle }\) with the others playing the role of control qubits. Therefore, the implementation of \(U_{\text {ph}}[f]\) uses at most \(\sum _{k=0}^p n^k = (n^{p+1}-1)/(n-1)\) (multi-)controlled phase gates. More efficiently, the (multi-)controlled phase gates with the same target and control qubits can be combined into a single (multi-)controlled phase gate by adding their rotation angles, which implies that we need only \(\sum _{k=1}^{\min \{p,n\}} \textrm{C}_k^n\) (multi-)controlled phase gates (i.e., \(\textrm{C}_k^n\) \((k-1)\)-controlled phase gates for \(k=1,2,\ldots ,\min \{p,n\}\) with a global phase gate) instead (Fig. 10). Here, \(\textrm{C}_k^n\) denotes the number of k-combinations of an n-element set. For \(p<n\), the total number of (multi-)controlled phase gates is both \(\mathcal {O}(n^p)\), while for \(p\ge n\), the total number of (multi-)controlled phase gates is reduced from \(\Omega (n^n)\) to \(2^n\). If we adopt the techniques for a generic multi-controlled single-qubit gate proposed by Silva and Park [36], the depth of a j-controlled phase gate is \(\mathcal {O}(j)\) even without any ancillary qubit. As a result, the depth of the total circuit for a p-th degree polynomial phase gate \(U_{\text {ph}}[f]\) is \(\mathcal {O}(pn^p)\) provided that \(p<n\).
Fig. 10
An example of polynomial phase gate \(U_{\text {ph}}[f]\) with \(p=2\) and \(n=4\). Here, \(\ell =L/N\) is the size of the sub-intervals. By combining the related (multi-)controlled gates and omitting the global phase, the number of (multi-)controlled gates is reduced from \(\sum _{k=1}^2 4^k = 20\) to \(\textrm{C}_1^4+\textrm{C}_2^4 = 10\)
Full size image
Now assume the piecewise polynomial approximation
$$\begin{aligned} {\tilde{V}}(x) = \sum _{\ell =0}^{{\tilde{M}}-1} \chi _{[{\tilde{x}}_{\ell }, {\tilde{x}}_{\ell +1})}(x) f_{\ell }(x), \quad x\in [0,L], \end{aligned}$$
where \(f_\ell \) is a polynomial function with degree \(p_{\ell }\) for each \(\ell =0,1,\ldots ,{\tilde{M}}-1\). According to the above discussion, the phase gate for a p-th degree polynomial is implemented by a global phase gate, \(\textrm{C}_1^n\) phase gates, \(\textrm{C}_2^n\) controlled phase gates, \(\textrm{C}_3^n\) 2-controlled phase gates, ..., \(\textrm{C}_{\min \{p,n\}}^n\) \((\min \{p,n\}-1)\)-controlled phase gates (Fig. 10). Similarly, the controlled phase gate for a p-th degree polynomial consists of a phase gate, \(\textrm{C}_1^n\) controlled phase gates, \(\textrm{C}_2^n\) 2-controlled phase gates, \(\textrm{C}_3^n\) 3-controlled phase gates, ..., \(\textrm{C}_{\min \{p,n\}}^n\) \(\min \{p,n\}\)-controlled phase gates (Fig. 11). On the other hand, the quantum comparator with a given integer in [34] is explicitly built by a QFT and its inverse quantum Fourier transform (IQFT) on m qubits, a QFT and its IQFT on \(m+1\) qubits, as well as \(2m+1\) phase gates. A conventional implementation of QFT on m qubits requires m Hadamard gates, \(m(m-1)/2\) controlled phase gates with a possible global phase gate, and \(\lfloor m/2\rfloor \) swap gates. Since the swap gates of QFT can be canceled in the circuit in [34] by the swap gates of its inverse, we find that the quantum comparator uses (at most) a global phase gate, \(4m+2\) Hadamard gates, \(2m+1\) phase gates, and \(2m^2\) controlled phase gates. In particular, for \(p=1\), PPP requires a global phase gate, \((8m+4)({\tilde{M}}-1)\) Hadamard gates, \(n+(4m+3)({\tilde{M}}-1)\) phase gates, and \((4m^2+n)({\tilde{M}}-1)\) controlled phase gates. Based on the gate set \(\{\textrm{H}, \textrm{CNOT}, R_z\}\), the depth of \(U_{\text {ph}}\) is 1, the depth of controlled \(U_{\text {ph}}\) is (at most) \(3n+1\le 4n\) (each controlled phase gate consists of 3 phase gates and 2 CNOT gates with depth 4, see Fig. 12, the first phase gate on the target qubit in the decomposition of these controlled phase gates can be implemented in parallel with the controlled global phase gate, which is a phase gate applied on the control qubit), and the depth of the comparator [34] is \(16m+16m-16\) because the depth of a QFT/IQFT on k qubits is \(8k-10\), \(k=m,m+1\), and the controlled phase gates between the QFT and IQFT can be partly implemented in parallel with the QFT, which introduces an additional depth of 4. Therefore, the circuit depth is bound by \(({\tilde{M}}-1)(4n+64\,m-32)+1\) for \(p=1\). As for \(p=2\), PPP requires a global phase gate, \((8m+4)({\tilde{M}}-1)\) Hadamard gates, \(n+3n(n-1)/2+(7n(n-1)/2+3n+12m^2+4m+3)({\tilde{M}}-1)\) phase gates, and \(n(n-1)+(4n(n-1)+2n+8m^2)({\tilde{M}}-1)\) CNOT gates. The depth of \(U_{\text {ph}}\) is (at most) 4n, the depth of controlled \(U_{\text {ph}}\) is (at most) \((6n-2)n\), and thus, the circuit depth is upper bound by \(({\tilde{M}}-1)(6n^2-2n+64m-32)+4n\). If we consider the parallel implementation for the polynomial phase gate (i.e., \(n(n-1)/2\) controlled phase gates and n phase gates can be implemented in depth \(\mathcal {O}(n)\) for which we use the same technique illustrated in Appendix C.1), then we can use the depth-efficient circuit for multiple controlled phase gates with the same control qubit [44] to reduce the depth from \(\mathcal {O}({\tilde{M}} n^2)\) to \(\mathcal {O}({\tilde{M}} n\log _2 n)\). For general \(p\ge 2\), the gate count and depth depend also on the polynomial degree p. For example, a naive implementation of a p-controlled phase gate without any ancillary qubits uses \(\mathcal {O}(p^2)\) single-qubit gates and CNOT gates. The depth in this case is at most \(\mathcal {O}({\tilde{M}} n^p)\). More precisely, we mention that the above estimations are based on a common implementation of a multi-controlled phase gate, and thus give an overhead on the gate count/depth of PPP. In this paper, the CNOT gates for 2-controlled and 3-controlled phase gates are counted as 8 and 20, respectively. Although they can be reduced to 6 and 14, separately, using smarter implementations [45], this does not change the crucial dependence on n, \(\delta \), etc.
Fig. 11
An example of controlled polynomial phase gate with \(p=2\) and \(n=4\). Here, \(l = L/N\) is the size of the sub-intervals
Full size image
Fig. 12
A direct decomposition of a controlled phase gate and a 2-controlled phase gate. The controlled phase gate can be decomposed into 3 phase gates and 2 CNOT gates with depth 4, and the 2-controlled phase gate can be decomposed into 7 phase gates and 8 CNOT gates with depth 12
Full size image
Next, we estimate the parameters m and \({\tilde{M}}\) for a given precision \(\delta >0\). Here, we fix the polynomial degree \(p\ge 1\) and give analytic estimations of these parameters. Recall that with a coarse-graining parameter m, we select the internal knots for polynomial interpolation among \(x_{Nk/M} = kL/M\), \(k=1,\ldots , M-1\), and the distance between two adjoint knots is at most L/M. Using the classical error estimation for the spline method [35, 43, 46], assuming \(V\in C^{p+1}[0,L]\), we have
$$\begin{aligned} \max _{j=0,\ldots ,N-1}|V(x_j)-{\tilde{V}}(x_j)|&\le \max _{x\in [0,L]}|V(x)-{\tilde{V}}(x)| \nonumber \\&\le C_p\max _{x\in [0,L]}|V^{(p+1)}(x)| \left( \frac{L}{M}\right) ^{p+1} \le \delta . \end{aligned}$$
(A2)
Here, \(C_p\) is a pre-constant depending on degree p. The optimal pre-constant is known for several types of spline methods, which can be found in the above references. The word “optimal" means that the second inequality becomes equality in the worst case (see “Appendix B.1” for more details). Thus, we estimate m for given degree p and precision \(\delta \) as
$$\begin{aligned} m = \Bigg \lceil \log _2 \left( L\left( \frac{C_p\Vert V^{(p+1)}\Vert _\infty }{\delta }\right) ^{\frac{1}{p+1}}\right) \Bigg \rceil . \end{aligned}$$
For arbitrarily fixed \(x\in [0,L]\), let h(x) solve
$$\begin{aligned} C_p \max _{y\in [x,x+h(x)]} |V^{(p+1)}(y)| (h(x))^{p+1} = \delta . \end{aligned}$$
Note that h(x) indicates the maximal interval size at x such that the error is upper bound by \(\delta \). Then the number of intervals \({\tilde{M}}\) can be approximated by
$$\begin{aligned} {\tilde{M}} \approx \int _0^L \frac{1}{h(x)} dx \ge \int _0^L \left( \frac{C_p|V^{(p+1)}(x)|}{\delta }\right) ^{\frac{1}{p+1}} dx =: {\hat{M}}_{p}. \end{aligned}$$
In general, we cannot take the number of polynomial \({\tilde{M}}\) as the above lower bound \({\hat{M}}_{p}\), and \({\tilde{M}}\) (possibly depending on the set of knots \(\{kL/M\}_{k=1}^{M-1}\)) is numerically calculated by some algorithms, for example, the ones we propose in Appendix A.5. However, we observe that \({\hat{M}}_{p} \le {\tilde{M}} \le c {\hat{M}}_{p}\le M\) for some constant \(c>1\), which becomes close to 1 as \(\delta \rightarrow 0\) (see “Appendix B.2”). Therefore, we can approximately use \({\hat{M}}_{p}\) for the asymptotic gate count.

Arithmetic operations and phase kickback

According to [5, 2427], we provide the gate implementation and estimate the gate count and the required number of ancillary qubits in each step mentioned in Sect. 2.
For step 1, Häner et al. [26] suggested using a comparator with a given integer by a carry circuit [47], an incrementer in [42], and the uncomputation for each interval of the piecewise function. The comparator in [47] on an n-qubit system register uses \(4(n-1)\) Toffoli gates, (at most) \(2n+2\) CNOT gates, and (at most) \(2(n-1)\) NOT gates with n uninitialized ancillas and a zero-initialized ancilla. On the other hand, the incrementer in [42] uses \(4(n-1)\) Toffoli gates, \(10(n-1)\) CNOT gates, and \(2n-1\) NOT gates with n uninitialized ancillas, which implies that step 1 requires \(12(n-1)({\tilde{M}}-1)\) Toffoli gates, \((14n-6)({\tilde{M}}-1)\) CNOT gates, and \((6n-5)({\tilde{M}}-1)\) NOT gates with n uninitialized ancillas, a zero-initialized ancilla, and a zero-initialized register of \(\lceil \log _2 {\tilde{M}}\rceil \) qubits.
For step 2, according to the QROM circuit in [25, Fig. 10], loading classical data to a data register of \(b_{\text {data}}\) qubits require \(2({\tilde{M}}-1)\) Toffoli gates, (at most) \(b_{\text {data}}{\tilde{M}} + {\tilde{M}} -1\) CNOT gates, and \(2({\tilde{M}}-1)\) NOT gates with \(\lceil \log _2 {\tilde{M}}\rceil +1\) initialized ancillas and a zero-initialized register of \(b_{\text {data}}\) qubits. Owing to the specific structure, the circuit can be simplified to \(4({\tilde{M}}-1)\) T gates, \((b_{\text {data}}+\mathcal {O}(1)){\tilde{M}}\) CNOT gates, and \(\mathcal {O}({\tilde{M}})\) Hadamard or S gates [25, Fig. 4].
In our formulation, we regard the system register as an integer register whose value is proportional to the grid points with a multiplier \(\Delta x = L/N\). Then, the operations in step 1 can be executed with an integer comparator at a relatively cheap cost. By noting
$$\begin{aligned} f_\ell (x_j)&= a_p^{(\ell )} x_j^p + \cdots + a_1^{(\ell )} x_j + a_0^{(\ell )}\\ &= \left( a_p^{(\ell )}(\Delta x)^p\right) j^p + \cdots + (a_1^{(\ell )}\Delta x) j + a_0^{(\ell )}, \end{aligned}$$
for each \(\ell =0,1,\ldots ,{\tilde{M}}-1\), we need to load \(p+1\) coefficients to the data registers, which store non-integer values. For simplicity, we follow the idea of [26] to consider fixed-point arithmetic and represent a number y in a non-integer register using \(b_{\text {tot}}\) binary bits as
$$\begin{aligned} y = y_{b_{\text {tot}}-1}\cdots y_{b_{\text {dec}}}.y_{b_{\text {dec}}-1}\cdots y_0, \end{aligned}$$
where \(b_{\text {dec}}\) is the number of bits after the decimal point. Henceforth, we set \(b_{\text {data}}=b_{\text {tot}}\), which will be estimated later. In other words, we use the same \((b_{\text {tot}},b_{\text {dec}})\) binary representation for all the data registers. To sum up, step 2 requires \(2(p+1)({\tilde{M}}-1)\) Toffoli gates, (at most) \((p+1)(b_{\text {tot}}{\tilde{M}} + {\tilde{M}} -1)\) CNOT gates, and \(2(p+1)({\tilde{M}}-1)\) NOT gates with \(\lceil \log _2 {\tilde{M}}\rceil +1\) initialized ancillas and zero-initialized registers of totally \((p+1)b_{\text {tot}}\) qubits.
For step 3, we first introduce another non-integer register for the grid points: \({|{x(j)}\rangle }\) with the same parameters \((b_{\text {tot}},b_{\text {dec}})\). By noting that \(x(j)=x_j=j\Delta x\), the preparation of such a register boils down to a fixed-point multiplication of j/N and L. According to [26, Appendix B], such a fixed-point multiplication uses controlled addition circuits on k qubits for \(k=b_{\text {dec}}+1,\ldots ,b_{\text {tot}}\) and another controlled addition circuits on k qubits for \(k=b_{\text {tot}}-b_{\text {dec}},\ldots ,b_{\text {tot}}-1\) with \(b_{\text {tot}}\) zero-initialized ancillas. By employing the controlled addition circuits in [48], the fixed-point multiplication uses \(\frac{3}{2}b_{\text {tot}}^2 + 3b_{\text {dec}}b_{\text {tot}} - 3b_{\text {dec}}^2 +\frac{9}{2}b_{\text {tot}}-3b_{\text {dec}}\) Toffoli gates and \(\mathcal {O}(b_{\text {tot}}^2)\) CNOT gates. Here, we mention that due to the truncation of the lower qubits if \(n>b_{\text {dec}}\), the values in the register \({|{x(j)}\rangle }\) have an error of at most \(L/2^{b_{\text {dec}}}\).
Also, we use the above \((b_{\text {tot}},b_{\text {dec}})\) binary representation for the result register \({|{{\tilde{V}}(x_j)}\rangle }\). Then, by applying the Horner scheme, we find that the implementation of the piecewise polynomial in a result register requires p fixed-point addition and p fixed-point multiplication operations:
$$\begin{aligned} ket{0} \mapsto {|{a_p^{(\ell )} x(j)+a_{p-1}^{(\ell )}}\rangle }&\mapsto {|{a_p^{(\ell )} x(j)^2+a_{p-1}^{(\ell )} x(j)+a_{p-2}^{(\ell )}}\rangle } \\&\mapsto \cdots \mapsto {|{a_p^{(\ell )} x(j)^p + \cdots + a_1^{(\ell )} x(j) + a_0^{(\ell )}}\rangle }. \end{aligned}$$
Then, step 3 requires \(p\left( \frac{3}{2}b_{\text {tot}}^2 + 3b_{\text {dec}}b_{\text {tot}} - 3b_{\text {dec}}^2 +\frac{13}{2}b_{\text {tot}}-3b_{\text {dec}}-1\right) \) Toffoli gates and \(\mathcal {O}(p b_{\text {tot}}^2)\) CNOT gates with \(p+1\) zero-initialized result registers containing \((p+1)b_{\text {tot}}\) qubits. Here, we do not include the gate count of generating \({|{x(j)}\rangle }\) as there may be a better way to do it, and this yields at most a pre-factor to the above estimation. Moreover, we mention that the enduring ancillary qubits can be reduced to \(2b_{\text {tot}}\) if one finds and employs an efficient in-place fixed-point multiplication. Furthermore, considering the truncation errors in the registers \({|{x(j)}\rangle }\) and \({|{a_k^{(\ell )}}\rangle }\), the accumulated error in the result register is upper bound by \(c_p(1+\sum _{q=1}^p(|a_q|_\infty +1) L^q)/2^{b_{\text {dec}}}\) where \(|a_p|_\infty \) denotes the maximum of \(|a_p^{(\ell )}|\) over all the intervals \([{\tilde{x}}_\ell , {\tilde{x}}_{\ell +1})\), \(\ell =0,1,\ldots ,{\tilde{M}}-1\) and \(c_p\) is some constant that possibly depends on p.
Now, we estimate the parameters \(b_{\text {dec}}\) and \(b_{\text {tot}}\) for a given precision \(\delta /2\). For the number of bits after the decimal point, the above estimate immediately gives
$$\begin{aligned} b_{\text {dec}} = 1+\Bigg \lceil \log _2 c_p(1+\sum _{q=1}^p(|a_q|_\infty +1) L^q)/\delta \Bigg \rceil , \end{aligned}$$
which is sufficient. On the other hand, for the integer part of the register, we have to avoid overflow in the fixed-point arithmetic operations to guarantee that the values in the result register approximate the calculated piecewise polynomial function. Then, the integer part should be able to describe values up to \(\sum _{q=0}^p |a_q|_\infty L^q\), which implies that a total number of
$$\begin{aligned} b_{\text {tot}} = 1 + \Bigg \lceil \log _2 c_p(1+\sum _{q=1}^p(|a_q|_\infty +1) L^q)/\delta \Bigg \rceil +\Bigg \lceil \log _2 \sum _{q=0}^p|a_q|_\infty L^q \Bigg \rceil \end{aligned}$$
is sufficient to derive a precision \(\delta /2\) in the result register. In particular, if \(p=1\), we have its upper bound as
$$\begin{aligned} b_{\text {tot}}= & 1 + \Bigg \lceil \log _2 (L\Vert V^{(1)}\Vert _\infty + \Vert V\Vert _\infty ) \Bigg \rceil \\ & + \Bigg \lceil \log _2 (1 + L + L\Vert V^{(1)}\Vert _\infty )\Bigg \rceil + \lceil \log _2 1/\delta \rceil . \end{aligned}$$
We remark that the number of qubits in the register for the result depends on the length L, some \(L^\infty \) norms of the potential, as well as its derivatives, and the desired precision \(\delta \), but is independent of the grid parameter n. Moreover, if \(x_j\) happens to have a \(b_{\text {tot}}\)-bit binary representation for all \(j=0,1,\ldots ,N-1\), then there is no error in the register \({|{x(j)}\rangle }\), and we find that
$$\begin{aligned} b_{\text {dec}} = 1+ \Bigg \lceil \log _2 \sum _{q=0}^p L^q/\delta \Bigg \rceil \end{aligned}$$
is enough.
For step 4, we use the \((b_{\text {tot}}+b_{\text {ext}})\)-bit binary representation for \({|{\gamma (\xi )}\rangle }\), where \(b_{\text {ext}}\) is some parameter for extra qubits. According to [24], the phase kickback is approximated by a modular adder between the result register (we can regard it as an integer register this time) and a prepared phase kickback register, which is derived by loading an integer smaller than \(2^{b_{\text {tot}}+b_{\text {ext}}}\) and applying a QFT circuit. Therefore, step 4 requires a QFT on \((b_{\text {tot}}+b_{\text {ext}})\) qubits, \(2(b_{\text {tot}}+b_{\text {ext}})-1\) Toffoli gates, and \(\mathcal {O}(b_{\text {tot}}+b_{\text {ext}})\) CNOT gates with \((b_{\text {tot}}+2b_{\text {ext}})\) zero-initialized ancillas for the registers and addition by using the adder in [48]. In order to guarantee a precision \(\delta /4\), the parameter \(b_{\text {ext}}\) should satisfy
$$\begin{aligned} b_{\text {ext}} = b_{\text {dec}}-b_{\text {tot}} + \Bigg \lceil \log _2 \frac{8\pi \Vert V\Vert _\infty }{\delta }\Bigg \rceil . \end{aligned}$$
Finally, step 5 includes the uncomputation, which doubles the gate count and circuit depth in steps 1–3.
If we assume the potential V is sufficiently smooth with bounded maximum norms and pay attention only to the parameters p, n, and \(\delta \), then APK uses \(\mathcal {O}\left( (n+p)\delta ^{-1/(p+1)}+p(p+\log (1/\delta ))^2\right) \) Toffoli gates, \(\mathcal {O}\left( (n+p)\delta ^{-1/(p+1)}\right. \left. +p(p+\log (1/\delta ))^2\right) \) Clifford gates, and \(\mathcal {O}\left( (\log (1/\delta ))^2\right) \) phase gates with \(\mathcal {O}(1+p(p+\log (1/\delta )))\) initialized ancillary qubits.

Spline method

We end this section with a proposal of classical algorithms to find the coefficients of a suitable piecewise polynomial approximation under a given precision. We note that the problem of finding piecewise polynomial approximations for a given function under a given precision does not admit uniqueness as one has many parameters such as the number of sub-intervals \({\tilde{M}}\) and the degrees of the polynomial \(p_\ell \), \(\ell =1,2,\ldots ,{\tilde{M}}\) in each sub-interval. Moreover, even if the above parameters are given, finding an optimal piecewise polynomial approximation is not unique because we have selections on the choice of the norm (used to evaluate the difference between the approximate and the original function).
Here, we use the maximum/supremum norm and consider the spline interpolation to derive the piecewise polynomial, which is efficient and avoids the problem of Runge’s phenomenon. There are other possible methods, for example, the least square method or minimax approximation based on solving a minimization problem. Here, we choose the spline method since there are well-established theoretical error estimations with known optimal pre-constants [35, 43, 46], which helps us avoid numerically solving a minimization problem if the polynomial degree is specified in advance and enables us to estimate the parameters analytically. In particular, evaluations of the division parameter m and number of sub-intervals \({\tilde{M}}\) yield the gate count/depth of the quantum circuit.
Our first proposal is for the case where the polynomial degree is a given integer. The procedure is listed as follows:
Algorithm 1 Fixed degree piecewise polynomial approximation
1.
Given a function V with domain [0, L], precision \(\delta >0\), polynomial degree p, calculate the \((p+1)\)-th derivative of V, and the optimal pre-constant \(C_p\) which can be found in [35, 43, 46].
 
2.
Calculate the division parameter \(m\in \mathbb {N}\) explicitly by
$$\begin{aligned} m = \Bigg \lceil \log _2 \left( L\left( \frac{C_p\Vert V^{(p+1)}\Vert _\infty }{\delta }\right) ^{\frac{1}{p+1}}\right) \Bigg \rceil , \end{aligned}$$
which defines \(2^m\) sub-intervals \(I_j=[jL/2^m, (j+1)L/2^m]\), \(j=0,1,\ldots ,2^m-1\).
 
3.
(i) Let \(j_1=0\), \(j_2=1\). (ii) Let \(j_2 \leftarrow j_2+1\), calculate the maximum norm of the \((p+1)\)-th derivative of V in the interval \([j_1L/2^m, j_2L/2^m]\) denoted by \(\Vert V_{j_1,j_2}^{(p+1)}\Vert _\infty \). If
$$\begin{aligned} C_p\Vert V_{j_1,j_2}^{(p+1)}\Vert _\infty \left( (j_2-j_1)L/2^m\right) ^{p+1}\le \delta , \end{aligned}$$
then go to the beginning of (ii). Otherwise, save \(j_2-1\) and update \(j_1\leftarrow j_2-1\), \(j_2\leftarrow j_1+1\), and go to the beginning of (ii). (iii) Continue (ii) and stop when \(j_2=2^m+1\). Then, a rough division of \({\tilde{M}}\le 2^m\) sub-intervals is obtained according to the saved numbers.
 
4.
Using the \({\tilde{M}}+1\) endpoints of the derived \({\tilde{M}}\) sub-intervals as knots, calculate the coefficients of the polynomial in each sub-interval by applying the p-th degree spline method.
 
As we know, by quantum computing, we divide the interval into \(2^m\) sub-intervals, and Steps 1–2 calculate explicitly the minimal m such that the error is upper bound by \(\delta \) even in the “worst" (most rapidly varying) sub-interval of V. Since the error by spline interpolation is very small in some mildly varying parts of V, Step 3 combines several sub-intervals so that the error in every new sub-intervals would be close to the given error bound \(\delta \). Of course, one can skip Step 3 and set \({\tilde{M}}=2^m\) in Step 4 by using the sub-intervals \(I_j\) in Step 2, which yields a uniform division, but Step 3 gives a more flexible non-uniform division, which reduces the number of sub-intervals.
Next, we propose a more involved algorithm where the polynomial degree can differ in each sub-interval. As mentioned above, there is never uniqueness in such a varying degree case, and we need to introduce some object function to obtain one unique solution.
Algorithm 2 Varying degree piecewise polynomial approximation
1.
Given a function V with domain [0, L], precision \(\delta >0\), grid parameter n, and admissible sets \(\mathcal {M} = \{m_{\text {min}}, \ldots , m_{\text {max}}\}\), \(\mathcal {P} = \{p_{\text {min}}, \ldots , p_{\text {max}}\}\) with \(m_{\text {max}}\le n\).
 
2.
For each \(m\in \mathcal {M}\), define \(2^m\) sub-intervals \(I_k=[kL/2^m, (k+1)L/2^m]\), \(k=0,1,\ldots ,2^m-1\). Then, calculate the polynomial degree for each \(k=0,1,\ldots ,2^m-1\) by solving the following constrained (minimization) problem:
$$\begin{aligned} p_k = p_k(m):= \min \left\{ p\in \mathcal {P}: \, C_p\Vert V_{k,k+1}^{(p+1)}\Vert _\infty \left( L/2^m\right) ^{p+1}\le \delta \right\} , \end{aligned}$$
(A3)
where \(C_p\) denotes the optimal pre-constant in [35, 43, 46], and \(\Vert V_{k,k+1}^{(p+1)}\Vert _\infty \) is the same notation as that in Algorithm 1.
 
3.
For each \(m\in \mathcal {M}\), execute the following steps. (i) Let \(j_1=0\), \(j_2=1\). (ii) Let \(j_2 \leftarrow j_2+1\). If \(p_{j_2-1}(m)\not =p_{j_1}(m)\), then save \(j_2-1\) and update \(j_1\leftarrow j_2-1\), \(j_2\leftarrow j_1+1\), and go to the beginning of (ii). Else if
$$\begin{aligned} C_{p_{j_1}}\Vert V_{j_1,j_2}^{(p_{j_1}+1)}\Vert _\infty \left( (j_2-j_1)L/2^m\right) ^{p_{j_1}+1}\le \delta , \end{aligned}$$
then go to the beginning of (ii). Otherwise, save \(j_2-1\) and update \(j_1\leftarrow j_2-1\), \(j_2\leftarrow j_1+1\), and go to the beginning of (ii). (iii) Continue (ii) and stop when \(j_2=2^m+1\). Then, a rough division of \({\tilde{M}} = {\tilde{M}}(m)\le 2^m\) sub-intervals is obtained according to the saved numbers.
 
4.
Based on the parameter \(m\in \mathcal {M}\), the calculated \(p_{k}(m)\) and \({\tilde{M}}(m)\), define an objective function \(F(m;n)={\tilde{F}}(m,p_{k}(m),{\tilde{M}}(m);n)\) where \({\tilde{F}}\) could be CNOT count for the quantum circuit implementing the piecewise polynomial function with parameters \(n,m,{\tilde{M}}\), and \(p_k\). Solve the minimization problem:
$$\begin{aligned} m^*= \textrm{argmin}_{m\in \mathcal {M}} F(m;n). \end{aligned}$$
(A4)
 
5.
According to the parameters \(m^*\), \({\tilde{M}}^*= {\tilde{M}}(m^*)\), and \(p_k^*= p_k(m^*)\), \(k=0,1,\ldots ,2^{m^*}-1\), calculate the coefficients of the \({\tilde{M}}^*\) polynomials by applying the \(p_{k_j}^*\)-th degree spline method for each interval with index \(j=0,1,\ldots ,{\tilde{M}}^*-1\).
 
Algorithm 2 is a specific classical algorithm to find a suitable piecewise polynomial approximation to a given function corresponding to, for example, its quantum implementation. Here, we need to introduce two finite admissible sets \(\mathcal {M}\), \(\mathcal {P}\) to guarantee the well-posedness of the minimization problems (A3) and (A4) (unique existence of the solution to each problem). Moreover, for the given function V and precision \(\delta \), \(m_{\text {max}}\), and \(p_{\text {max}}\) must be chosen sufficiently large, or there could be no solution to the minimization problems. The comparison between Algorithms 1 and 2 is provided in Appendix B.1.

Comparison of different methods

Comparison of classical algorithms

In this subsection, we compare the classical algorithms based on the spline methods in terms of an example. Assume that we are given the modified Coulomb potential \(V(x)=A/\sqrt{a^2+(x-L/2)^2}\), \(x\in [0,L]\) with \(A=1\), \(L=20\), and \(a^2=0.5\). The numerically calculated coarse-graining parameter m and sub-interval number \({\tilde{M}}\) by several different spline interpolations are listed in Table 4. Comparing \({\tilde{M}}\) for [a] and [b], we find Step 3 in Algorithm 1 (non-uniform division) heavily reduces the number of intervals. In addition, the results for [d] and [e] show that the cubic Hermite spline outperforms the cubic spline of type I/type II [43] since its optimal pre-constant is smaller. Moreover, focusing on [b], [c], and [e], we find that a higher-degree spline interpolation gives smaller parameters m and \({\tilde{M}}\). Furthermore, Algorithm 2 provides “worse" parameters than Algorithm 1 by noting [e] and [f]. However, it gives a better CNOT count because the quantum circuit for a higher-degree polynomial is more expensive, and it balances the polynomial degree regarding the cost of quantum implementation. Using the quantum circuit of PPP, we can calculate the CNOT count of the quantum circuit analytically, and the result is given in Table 5.
Table 4
Parameters m, \({\tilde{M}}\) by different splines with respect to \(\delta \) for modified Coulomb potential with \(A=1\), \(a^2=0.5\), \(L=20\)
m, \({\tilde{M}}\)
\(\delta =10^{-1}\)
\(\delta =10^{-2}\)
\(\delta =10^{-3}\)
\(\delta =10^{-4}\)
\(\delta =10^{-6}\)
Uniform \(\hbox {linear}^\mathrm{[a]}\)
6, 64
7, 128
9, 512
11, 2048
14, 16384
\(\hbox {Linear}^\mathrm{[b]}\)
6, 12
7, 26
9, 70
11, 216
14, 2072
\(\hbox {Quadratic}^\mathrm{[c]}\)
5, 8
6, 16
7, 30
8, 60
11, 270
\(\hbox {Cubic}^\mathrm{[d]}\)
6, 12
6, 18
7, 28
8, 48
10, 128
Cubic \(\hbox {Hermite}^\mathrm{[e]}\)
5, 8
6, 12
7, 18
7, 30
9, 94
Optimized degree-\(\hbox {varying}^\mathrm{[f]}\)
6, 12
7, 26
8, 70
8, 124
10, 594
\(^\mathrm{[a]}\)Algorithm 1 without Step 3 for \(p=1\) and \(C_1=1/8\) (linear interpolation [35, 43])
\(^\mathrm{[b]}\)Algorithm 1 for \(p=1\) and \(C_1=1/8\) (linear interpolation [35, 43])
\(^\mathrm{[c]}\)Algorithm 1 for \(p=2\) and \(C_2=2/81\) (two-point Hermite interpolation [35])
\(^\mathrm{[d]}\)Algorithm 1 for \(p=3\) and \(C_3=5/384\) (cubic spline [43])
\(^\mathrm{[e]}\)Algorithm 1 for \(p=3\) and \(C_3=1/384\) (cubic Hermite spline [35, 43])
\(^\mathrm{[f]}\)Algorithm 2 for \(\mathcal {P}=\{1,2,3\}\), \(\mathcal {M}=\{2,\ldots ,n\}\) and \(C_1=1/8\), \(C_2=2/81\), \(C_3=1/384\) with \(n=19\). Here, the objective function is set to be CNOT count corresponding to PPP (see “Appendix A.3”)
Table 5
CNOT count of PPP by different splines with respect to \(\delta \) for modified Coulomb potential with \(A=1\), \(a^2=0.5\), \(L=20\), and \(n=19\)
CNOT count
\(\delta =10^{-1}\)
\(\delta =10^{-2}\)
\(\delta =10^{-3}\)
\(\delta =10^{-4}\)
\(\delta =10^{-6}\)
Uniform \(\hbox {linear}^\mathrm{[a]}\)
20538
54610
350546
2059282
26311098
\(\hbox {Linear}^\mathrm{[b]}\)
3586
10750
47334
216290
3326026
\(\hbox {Quadratic}^\mathrm{[c]}\)
11584
25752
52484
113504
638948
\(\hbox {Cubic}^\mathrm{[d]}\)
239908
366352
579900
1009100
2749516
Cubic \(\hbox {Hermite}^\mathrm{[e]}\)
154996
239908
368120
622256
2001456
Optimized degree-\(\hbox {varying}^\mathrm{[f]}\)
3586
10750
44790
121002
1015862
\(^\mathrm{[a]}\)Algorithm 1 without Step 3 for \(p=1\) and \(C_1=1/8\) (linear interpolation [35, 43])
\(^\mathrm{[b]}\)Algorithm 1 for \(p=1\) and \(C_1=1/8\) (linear interpolation [35, 43])
\(^\mathrm{[c]}\)Algorithm 1 for \(p=2\) and \(C_2=2/81\) (two-point Hermite interpolation [35])
\(^\mathrm{[d]}\)Algorithm 1 for \(p=3\) and \(C_3=5/384\) (cubic spline [43])
\(^\mathrm{[e]}\)Algorithm 1 for \(p=3\) and \(C_3=1/384\) (cubic Hermite spline [35, 43])
\(^\mathrm{[f]}\)Algorithm 2 for \(\mathcal {P}=\{1,2,3\}\), \(\mathcal {M}=\{2,\ldots ,n\}\) and \(C_1=1/8\), \(C_2=2/81\), \(C_3=1/384\) with \(n=19\). Here, the objective function is set to be CNOT count corresponding to PPP (see “Appendix A.3”)
With a fixed grid parameter \(n=19\), Table 5 confirms the better performance of the non-uniform spline beyond the uniform one and that the cubic Hermite spline yields fewer CNOT gates than the cubic spline of type I/type II. However, due to the polynomial factor \(n^p\) (exponentially increasing for p) in gate count, it is not correct that the higher degree we choose, the better performance we gain. The best polynomial degree in Algorithm 1 depends on n, \(\delta \), V, and the spline method one uses. In the above specific case, the quadratic spline is the best for sufficiently small \(\delta \). A slightly involved issue is the comparison between Algorithms 1 and 2. Algorithm 2 gives a reasonable spline without knowledge of degree p, but it does not outperform Algorithm 1 sometimes. We choose the smallest possible degree in each sub-interval in Algorithm 2 to reduce the difficulty of solving the minimization problem (on a classical computer), which prevents us from the trade-off by increasing the degree and decreasing the total number of sub-intervals. Of course, Algorithm 2 with \(\mathcal {P} = \{p_0\}\) outperforms Algorithm 1 for a given degree \(p_0\) because it addresses the trade-off by increasing the division parameter m and decreasing the total number of sub-intervals.
On the other hand, we give a remark on the optimal pre-constants in spline interpolations. According to the spline method one uses, such constant varies and can be found in [35, 43, 46]. The word “optimal" means that the error estimate (i.e., the second inequality in (A2)) would become equality in the worst case. The optimality is closely related to the asymptotic behavior as \(\delta \rightarrow 0\), which we illustrate by listing the numerical maximum errors in Table 6. We find that the maximum error between the spline interpolation and the original function is larger than \(\delta /2\) as \(\delta \le 0.01\) and tends to \(\delta \) as \(\delta \rightarrow 0\), which indicates that our algorithm based on these theoretical optimal constants would be efficient in high-precision cases. In contrast, it may give higher precision than desired for a large given \(\delta \) at the cost of using more resources.
Table 6
Numerical maximum error by different splines with respect to \(\delta \) for modified Coulomb potential with \(A=1\), \(a^2=0.5\), \(L=20\)
Maximum error
\(\delta =10^{-1}\)
\(\delta =10^{-2}\)
\(\delta =10^{-3}\)
\(\delta =10^{-4}\)
\(\delta =10^{-5}\)
\(\delta =10^{-6}\)
\(\delta =10^{-7}\)
\(\hbox {Linear}^\mathrm{[a]}\)
\(0.403\times 10^{-1}\)
\(0.811\times 10^{-2}\)
\(0.973\times 10^{-3}\)
\(0.980\times 10^{-4}\)
\(0.991\times 10^{-5}\)
\(0.997\times 10^{-6}\)
\(0.999\times 10^{-7}\)
\(\hbox {Quadratic}^\mathrm{[b]}\)
\(0.382\times 10^{-1}\)
\(0.797\times 10^{-2}\)
\(0.718\times 10^{-3}\)
\(0.906\times 10^{-4}\)
\(0.926\times 10^{-5}\)
\(0.962\times 10^{-6}\)
\(0.985\times 10^{-7}\)
Cubic \(\hbox {Hermite}^\mathrm{[c]}\)
\(0.167\times 10^{-1}\)
\(0.535\times 10^{-2}\)
\(0.772\times 10^{-3}\)
\(0.664\times 10^{-4}\)
\(0.691\times 10^{-5}\)
\(0.827\times 10^{-6}\)
\(0.952\times 10^{-7}\)
\(^\mathrm{[a]}\)Algorithm 1 for \(p=1\) and \(C_1=1/8\) (linear interpolation [35, 43])
\(^\mathrm{[b]}\)Algorithm 1 for \(p=2\) and \(C_2=2/81\) (two-point Hermite interpolation [35])
\(^\mathrm{[c]}\)Algorithm 1 for \(p=3\) and \(C_3=1/384\) (cubic Hermite spline [35, 43])

Relation between numerically calculated number of sub-intervals and its theoretical lower bound

We check the relation between the number of sub-intervals \({\tilde{M}}_p\) and its theoretical lower bound \({\hat{M}}_{p}\) for the polynomial degree \(p=1,2,3\), the precision \(\delta >0\), and two different functions.
The first function is the modified Coulomb potential in Example 1 of Sect. 3.2. Let \(A=1\) and \(L=20\) be fixed. We consider several values of parameter \(a^2\). For a given polynomial degree p, recall that \({\tilde{M}}_p\) is the number of sub-intervals obtained in Step 3 of Algorithm 1, while \({\hat{M}}_{p}\) is its theoretical lower bound defined in Appendix A.3. We plot the ratio \({\tilde{M}}_p/{\hat{M}}_p\) as a function of \(\delta \) under some different choices of \(a^2=0.5, 0.1, 0.001\) in Fig. 13. In the calculations of \({\tilde{M}}_p\) and \({\hat{M}}_p\), we use \(C_1=1/8\), \(C_2=2/81\), and \(C_3=1/384\), which corresponds to the two-point Hermite interpolations [35]. Besides, \({\hat{M}}_{p}\) is calculated by numerical integration (a Riemann sum) for given p and \(\delta \).
The second function is an oscillating one in Example 2 of Sect. 3.2. Let \(A=1\), \(\omega =2\), and \(L=20\) be fixed. Again, we plot the ratio \({\tilde{M}}_p/{\hat{M}}_p\) as a function of \(\delta \) under some different choices of \(a=1, 0.1, 0.001\) in Fig. 14. Figures 13 and 14 show that for sufficiently small \(\delta \), the lower bound \({\hat{M}}_{p}\) gives a good estimation of the number of intervals \({\tilde{M}}_p\) up to a constant c (i.e., \({\tilde{M}}_p\le c{\hat{M}}_{p}\)). Moreover, although such a constant varies for different functions and precision bounds, it is smaller than 7/3 and becomes closer to 1 for smaller \(\delta \).
Fig. 13
(Color online) Ratio between the number of sub-intervals \({\tilde{M}}_p\) by Algorithm 1 and its lower bound \({\hat{M}}_p\) against the reciprocal of precision \(1/\delta \) for the modified Coulomb potential with fixed \(A=1\), \(L=20\), and varying parameter \(a^2=0.5, 0.1, 0.001\). In each subplot, we demonstrate the change of the ratios for different spline interpolations as \(\delta \) decreases
Full size image
Fig. 14
(Color online) Ratio between the number of sub-intervals \({\tilde{M}}_p\) by Algorithm 1 and its lower bound \({\hat{M}}_p\) against the reciprocal of precision \(1/\delta \) for an oscillating function with fixed \(A=1\), \(\omega =2\), \(L=20\), and varying parameter \(a=1, 0.1, 0.001\). In each subplot, we demonstrate the change of the ratios for different spline interpolations as \(\delta \) decreases
Full size image

Analytical comparison of WAL, LIU, and PPP

In the case study, we find that PPP with \(p=2\) uses fewer CNOT gates than PPP with \(p=3\) in practical settings. Here, we clarify when such observation is valid under some assumptions. We assume \(V\in C^{q}[0, L]\) for some \(q\in \mathbb {N}\) and define \({\tilde{V}}_p(x):=(C_p|V^{(p+1)}(x)|)^{1/(p+1)}\) for \(x\in [0, L]\) and \(p=0,1,\ldots ,q-1\), where \(C_p\) is the optimal constant for spline interpolations [35]. For each p, we introduce several constants:
$$\begin{aligned} C_{p,1}:= \Vert {\tilde{V}}_{p}\Vert _{L^1(0,L)} = \int _0^L |{\tilde{V}}_{p}(x)| dx, \quad C_{p,\infty }:= L \Vert {\tilde{V}}_{p}\Vert _{L^\infty (0,L)} = L \Vert {\tilde{V}}_p\Vert _{\infty }. \end{aligned}$$
Using these constants, we can express \(m_p\) and \({\hat{M}}_p\) in Appendix A.3 by
$$\begin{aligned} m_p = \Bigg \lceil \log _2 \left( C_{p,\infty }\delta ^{-1/(p+1)}\right) \Bigg \rceil , \quad {\hat{M}}_p = \Bigg \lceil C_{p,1}\delta ^{-1/(p+1)}\Bigg \rceil . \end{aligned}$$
(B5)
We note that \(C_{p,1}=C_{p,\infty }=0\) if and only if V is a p-th degree polynomial in [0, L]. In such trivial cases, we can take \({\tilde{M}}_p=1\), and the implementation by PPP is precise and exhibits a lower gate count compared to WAL in the case of small \(\delta \). Therefore, we mainly consider the cases that \(C_{p,1}\) and \(C_{p,\infty }\) do not vanish. From Eq. (B5) and the facts that \(C_{p,1}, C_{p,\infty }\) are independent of \(\delta \), we find that \({\tilde{M}}_p\ge {\hat{M}}_p=\mathcal {O}(\delta ^{-1/(p+1)})\) for sufficiently small \(\delta \) provided that \(C_{p,1}\) is bounded from below, which implies that for a given function V and a sufficiently small \(\delta \), a non-uniform division has similar performance as a uniform division regarding \(\delta \). According to the discussion in Appendix B.2, we further assume that there exists a \(\delta \)-independent constant \(c\ge 1\) such that \({\hat{M}}_p\le {\tilde{M}}_p\le c{\hat{M}}_p\) holds for any \(p\in \mathbb {N}\). Thus, we have
$$\begin{aligned} {\tilde{C}}:= \frac{{\tilde{M}}_2}{{\tilde{M}}_3}\delta ^{1/12}\in \left[ c^{-1}\frac{C_{2,1}}{C_{3,1}}, c\frac{C_{2,1}}{C_{3,1}}\right] . \end{aligned}$$
This indicates that \({\tilde{C}}\) is upper and lower bound by some constants that are independent of \(\delta \). Using the analytic CNOT count for each method, for a fixed precision \(\delta >0\), we obtain
\(CC_{\text {WAL}}(n) = 2^{n}-2\), \(n\in [1, m_0]\),
\(CC_{\text {LIU}}(n) = 2(n-m_1)(2^{m_1}+2m_1^2-2)+2^{m_1}-2\), \(n\in [m_1, m_0]\),
\(CC_{\text {mLIU}}(n) = (n-m_1)(3\cdot 2^{m_1}-4)+2^{m_1}-2\), \(n\in [m_1, m_0]\),
\(CC_{\text {PPP1}}(n) = (2n+8m_1^2)({\tilde{M}}_1-1)\), \(n\in [m_1, m_0]\),
\(CC_{\text {PPP2}}(n) = n(n-1)+(2n+4n(n-1)+8m_2^2)({\tilde{M}}_2-1)\), \(n\in [m_2, m_0]\),
\(CC_{\text {PPP3}}(n) = n(n-1)+4n(n-1)(n-2)/3+(2n+4n(n-1)+10n(n-1)(n-2)/3+8m_3^2)({\tilde{M}}_3-1)\), \(n\in [m_3, m_0]\).
First, we compare PPP with different polynomial degrees. Define the difference of CNOT count between PPP with \(p=2\) and PPP with \(p=3\) by
$$\begin{aligned} g(n)&:= CC_{\text {PPP2}}(n)-CC_{\text {PPP3}}(n)\\&= (2n+4n(n-1)+8m_2^2)({\tilde{M}}_2-{\tilde{M}}_3)\\&\quad -\left( 8m_3^2-8m_2^2+\frac{10}{3}n(n-1)(n-2)\right) ({\tilde{M}}_3-1)\\&\quad - \frac{4}{3}n(n-1)(n-2)\\&=-\frac{1}{3}(10{\tilde{M}}_3-6)n^3 + (6{\tilde{M}}_3+4{\tilde{M}}_2-6)n^2\\&\quad -\frac{1}{3}(14{\tilde{M}}_3+6{\tilde{M}}_2-12)n + 8m_2^2({\tilde{M}}_2-1) \\&\quad -8m_3^2({\tilde{M}}_3-1). \end{aligned}$$
Recall that \(L/2^{m_p}\) and \({\tilde{M}}_p\) are the size of the sub-interval and the number of sub-intervals to guarantee the precision \(\delta \) regarding p-th degree spline interpolation. It is natural to assume
$$\begin{aligned} m_1\ge m_2\ge m_3 \ge 1 \quad \text{ and } \quad {\tilde{M}}_2\ge {\tilde{M}}_3\ge 1, \end{aligned}$$
(B6)
which implies \(g(0)>0\). We calculate the derivative of g with respect to n:
$$\begin{aligned} g^\prime (n)&= -(10{\tilde{M}}_3-6)n^2 + (12{\tilde{M}}_3+8{\tilde{M}}_2-12)n -(14{\tilde{M}}_3+6{\tilde{M}}_2-12)/3. \end{aligned}$$
By assumption (B6), we can check
$$\begin{aligned} (12{\tilde{M}}_3+8{\tilde{M}}_2-12)^2-4(10{\tilde{M}}_3-6)(14{\tilde{M}}_3+6{\tilde{M}}_2-12)/3>0. \end{aligned}$$
Then, letting \(g^\prime (n)=0\), we have g(n) takes its local minimum/maximum at
$$\begin{aligned} n_{\pm }^*&= \frac{3{\tilde{M}}_3+2{\tilde{M}}_2-3 \pm \sqrt{(3{\tilde{M}}_3+2{\tilde{M}}_2-3)^2-(5{\tilde{M}}_3-3)(7{\tilde{M}}_3 +3{\tilde{M}}_2-6)/3}}{5{\tilde{M}}_3-3}\\&= \frac{(2{\tilde{C}}\delta ^{-1/12}+3){\tilde{M}}_3-3 \pm \sqrt{((2{\tilde{C}}\delta ^{-1/12}{+}3){\tilde{M}}_3-3)^2{-}(5{\tilde{M}}_3-3)((3{\tilde{C}} \delta ^{-1/12}+7){\tilde{M}}_3-6)/3}}{5{\tilde{M}}_3-3}. \end{aligned}$$
Therefore, \(g(n)<0\) for \(n\in [m_1, m_0]\) if and only if \(n_+^*\le m_1\) and \(g(m_1)<0\) or \(n_+^*> m_1\) and \(g(\min \{n_+^*, m_0\})<0\). By noting that \(n_+^*\), \(m_1\) depends on \(\delta \), if we have
$$\begin{aligned} \delta \in \Delta _V :=&\{\delta>0; n_+^*(\delta )\le m_1(\delta ), g(m_1(\delta ))<0\} \\ &\cup \{\delta>0; n_+^*(\delta )> m_1(\delta ), g(\min \{n_+^*(\delta ), m_0(\delta )\})<0\}, \end{aligned}$$
then we obtain \(g(n)<0\) for any \(n\in [m_1(\delta ), m_0(\delta )]\).
Second, we compare WAL and PPP with \(p=2\) in \(n\in (0, m_1]\). We define the difference of CNOT count between these two methods by
$$\begin{aligned} {\bar{g}}(n)&:= CC_{\text {WAL}}(n) - CC_{\text {PPP2}}(n)\\&= 2^n-2-n(n-1)-(4n^2-2n+8m_2^2)({\tilde{M}}_2-1). \end{aligned}$$
We can verify \({\bar{g}}(0)<0\). We calculate the derivative of \({\bar{g}}\) with respect to n:
$$\begin{aligned} {\bar{g}}^\prime (n)&= (\ln 2) 2^n -2n+1-(8n-2)({\tilde{M}}_2-1)\\&= (\ln 2) 2^n - (8{\tilde{M}}_2-6)n + (2{\tilde{M}}_2-1). \end{aligned}$$
Let \(n_1^*\) and \(n_2^*\) be the solutions to \({\bar{g}}^\prime (n) = 0\). Then, we can verify that \({\bar{g}}(n_1^*)<0\) and \({\bar{g}}(n_2^*)<0\) provided that \(m_2\ge 2\) (i.e., \(\delta <C_{2,\infty }^3/8\)). Therefore, \({\bar{g}}(n)<0\) for \(n\in (0,m_1]\) if \(m_2\ge 2\) and \({\bar{g}}(m_1)<0\). If we define
$$\begin{aligned} {\bar{\Delta }}_V:= \{0<\delta<C_{2,\infty }^3/8; {\bar{g}}(m_1(\delta ))<0\}, \end{aligned}$$
then we obtain \({\bar{g}}(n)<0\) for any \(n\in (0, m_1(\delta )]\) provided that \(\delta \in {\bar{\Delta }}_V\).
Third, we also check the difference between LIU and WAL by defining
$$\begin{aligned} {\tilde{g}}(n)&:= CC_{\text {LIU}} - CC_{\text {WAL}}\\&= 2(n-m_1)(2^{m_1}+2m_1^2-2)+2^{m_1}-2^n, \quad n\in [m_1, \infty ). \end{aligned}$$
It is clear that \({\tilde{g}}(m_1)=0\), and we calculate the derivative
$$\begin{aligned} {\tilde{g}}^\prime (n) = 2(2^{m_1}+2m_1^2-2)-(\ln 2) 2^n. \end{aligned}$$
Since \({\tilde{g}}^\prime (m_1)=(2-\ln 2)2^{m_1}+4m_1^2-4>0\) and \({\tilde{g}}^\prime (n)=0\) has only one solution, we conclude that \({\tilde{g}}(n)=0\) has two solutions \(n_3^*=m_1\) and \(n_4^*>m_1\). Therefore, if \(\delta \in {{\tilde{\Delta }}}_V:=\{\delta>0; m_0(\delta )> n_4^*(\delta )\}\), then \({\tilde{g}}(n)\le 0\) for \(n\in [n_4^*,m_0]\). In particular, we have \({\tilde{g}}(m_0)\le 0\), which implies that LIU outperforms WAL for large \(n\ge m_0\).
Table 7
Numerical evaluations of \(\Delta _V\), \({\bar{\Delta }}_V\), and \({{\tilde{\Delta }}}_V\) for the functions in Sect. 3.2
 
\(\Delta _V\cap \mathcal {A}_\Delta \)
\({\bar{\Delta }}_V\cap \mathcal {A}_\Delta \)
\({{\tilde{\Delta }}}_V\cap \mathcal {A}_\Delta \)
Ex. 1: \(a^2=0.5\)
\((1\times 10^{-9},0.1)\)
\((1\times 10^{-9},0.1)\)
\((1\times 10^{-9},7.5\times 10^{-3})\cup (8.6\times 10^{-3}, 1.5\times 10^{-2})\)
Ex. 1: \(a^2=0.1\)
\((1\times 10^{-9},0.1)\)
\((1\times 10^{-9},0.1)\)
\((1\times 10^{-9},1.9\times 10^{-2})\cup (2.4\times 10^{-2}, 3.8\times 10^{-2})\)
Ex. 1: \(a^2=0.001\)
\((1\times 10^{-9},0.1)\)
\((1.4\times 10^{-9},4.0\times 10^{-9})\cup (5.6\times 10^{-9},0.1)\)
\((1\times 10^{-9},0.1)\)
Ex. 2: \(a=1\)
\((1\times 10^{-9},0.1)\)
\((1\times 10^{-9},0.1)\)
\((1\times 10^{-9},3.4\times 10^{-2})\)
Ex. 2: \(a=0.1\)
\((1\times 10^{-9},0.1)\)
\((1\times 10^{-9},0.1)\)
\((1\times 10^{-9},3.7\times 10^{-2})\)
Ex. 2: \(a=0.001\)
\((1\times 10^{-9},5.8\times 10^{-9})\cup (7.9\times 10^{-9},1.2\times 10^{-8})\)
\((1\times 10^{-9},0.1)\)
\((1\times 10^{-9},3.9\times 10^{-2})\)
 
\(\cup (1.3\times 10^{-8},9.1\times 10^{-8})\cup (1.1\times 10^{-7},0.1)\)
   
The above discussion explains the reason why we observe (i) and (iii) in Sect. 3.2. By the definition of \(\Delta _V\), \({\bar{\Delta }}_V\), and \({{\tilde{\Delta }}}_V\), we find that they depend only on the function V. For the given functions in Sect. 3.2, we list these sets in Table 7 by numerical calculations. Here, \(\mathcal {A}_\Delta :=(1\times 10^{-9}, 0.1)\) denotes a specific set of \(\delta \). The minimal value is set as \(1\times 10^{-9}\) because this is sufficiently small for practical applications, and it is expensive for a classical computer to calculate a finer division than \(2^{30}\approx 10^{9}\) sub-intervals in Algorithm 1 or 2. In contrast, although the estimation of \({\tilde{M}}_p\) by Algorithm 1 or 2 is time-consuming for small \(\delta <1\times 10^{-9}\) on a classical computer, one can use the easily calculated theoretical bound \({\hat{M}}_p\) to obtain some subsets of \(\Delta _V\), \({\bar{\Delta }}_V\), and \({{\tilde{\Delta }}}_V\) instead. From the first two columns of Table 7, we verify that PPP with \(p=2\) outperforms PPP with \(p=3\) for \(m_1 \le n\le m_0\), and WAL outperforms PPP for \(0<n\le m_1\) provided that \(1.1\times 10^{-7}< \delta < 0.1\). Therefore, for most cases, we should choose WAL if the quantum resource is limited (i.e., \(n\le m_1\)), while we suggest comparing WAL, LIU, and PPP with \(p=2\) if \(n>m_1\). On the other hand, we may need to take the higher-degree piecewise polynomial approximations into consideration in the high-precision cases that \(\delta <10^{-9}\).

Comparison of methods on CNOT count/depth by Qiskit emulation

In Sect. 3, we gave a detailed comparison of WAL, LIU, and PPP on CNOT count by using analytical estimation for each method. In practice, such analytical evaluations deliver the corresponding theoretical upper bounds in the worst case. For example, we did not consider the possibility of canceling CNOT gates. In this subsection, we calculate the CNOT count and circuit depth of each method for several examples by using a quantum emulator (Qiskit [49]). We check whether a hidden function-independent improvement of each method exists (specifically in CNOT count) and compare the circuit depth based on the native gates of the IBM quantum computer.
First, we study the modified Coulomb potential as Example 1 with parameters \(A=1\), \(a^2=0.1\), and \(L=20\) under a relatively high precision \(\delta =0.001\) and a lower one \(\delta =0.1\). For \(n=m_1,m_1+1,\ldots ,m_0\), the CNOT counts and depths for WAL, LIU, and PPP are demonstrated in Fig. 15.
Fig. 15
(Color online) CNOT count/depth by Qiskit with respect to n for the modified Coulomb potential. The parameters \(A=1\), \(a^2=0.1\), and \(L=20\) are fixed. The first column corresponds to a relatively high precision \(\delta =0.001\), while the second corresponds to a relatively low precision \(\delta =0.1\). The gray lines in the subplots of the CNOT count are the guidelines of the theoretical CNOT count for each method discussed in Sect. 3. 2nd-degree PPP and 3rd-degree PPP denote PPP by Algorithm 1 for \(p=2\) and \(p=3\), respectively, while VD-PPP denotes PPP by Algorithm 2 for \(\mathcal {P}=\{1,2,3\}\) and \(\mathcal {M}=\{2,\ldots , n\}\) with the objective function taking the theoretical CNOT count
Full size image
Except for PPP with \(p=3\) (i.e., cubic Hermite spline), the theoretical CNOT count coincides with the CNOT count by Qiskit after decomposition into the native gates. Although it is possible to cancel a small part of CNOT gates when a high-degree spline is employed, in principle, it seems sufficient to discuss the theoretical CNOT count. On the other hand, Fig. 15 also shows the circuit depth after decomposition into the native gates using Qiskit [49]. For the lower precision \(\delta =0.1\), WAL is the best one among the methods. In contrast, for the higher precision \(\delta =0.001\), WAL is the best method for \(n\le 13\), and LIU is the best for \(n\ge 14\), which is the same observation as for CNOT count.
We also investigate a damped oscillating function as Example 2 with parameters \(A=1\), \(a=0.1\), \(\omega =2\), and \(L=20\) under a relatively high precision \(\delta =0.001\) and a lower one \(\delta =0.1\). We find a similar observation as Example 1 in Fig. 16, but small gaps between the theoretical CNOT counts and the CNOT counts by Qiskit for several methods, including 3rd-degree PPP. We believe these gaps come from the possible vanishing of some phase gates regarding a specific function, and it does not influence the comparison result whether we use the theoretical gate counts or the ones by Qiskit emulation.
Fig. 16
(Color online) CNOT count/depth by Qiskit with respect to n for a damped oscillating function. The parameters \(A=1\), \(a=0.1\), \(\omega =2\), and \(L=20\) are fixed. The first column corresponds to a relatively high precision \(\delta =0.001\), while the second corresponds to a relatively low precision \(\delta =0.1\). The gray lines in the subplots of the CNOT count are the guidelines of the theoretical CNOT count for each method discussed in Sect. 3. 2nd-degree PPP and 3rd-degree PPP denote PPP by Algorithm 1 for \(p=2\) and \(p=3\), respectively, while VD-PPP denotes PPP by Algorithm 2 for \(\mathcal {P}=\{1,2,3\}\) and \(\mathcal {M}=\{2,\ldots , n\}\) with the objective function taking the theoretical CNOT count
Full size image
If we focus on the circuit depth in Figs. 15 and 16, then we find an interesting observation that the depth for 2nd-degree PPP has a relatively small value when \(n=2^k\) for some \(k\in \mathbb {N}\). Since this is not observed for CNOT count, it indicates an efficient parallelization of 2nd-degree PPP for \(n=2^k\).

Application to first-quantized Hamiltonian simulation

Linear depth for the part of two-particle interaction potential

We prove that in the first-quantized Hamiltonian simulation for \(N_e\) particles, the circuit depth for the real-time evolution of the two-particle interaction potential is \(\mathcal {O}(N_e)\), though the number of interaction potentials is \(C^{N_e}_2 = \frac{1}{2}N_e(N_e-1) = \mathcal {O}(N_e^2)\). In fact, we can also include the one-particle potential and keep the linear depth with respect to the number of particles.
We characterize the one-particle and two-particle potentials by unordered pairs \(\{(i,j)\}_{i,j=1,\ldots ,N_e}\). Here, (ij) describes the interaction potential between the i-th particle and the j-th particle, and (ii) indicates the one-particle potential for the i-th particle.
In this formulation, our target is to seek \(N_e\) sets, the pairs are implemented simultaneously in each set, such that they include all the one-particle and two-particle potentials once and only once and any particle is referred only once in each set.
Without geometric constraints, the construction of the sets is not unique in general. Here, we provide only one candidate, which can be readily verified. We introduce the sets for an odd number, and then we use the introduced sets to define the ones for an even number. Henceforth, we denote the number of particles by N for simplicity.
  • For \(N = 2n-1\), \(n\in \mathbb {N}\), define the k-th set by
    $$\begin{aligned} [k]_{N}^{\text {odd}}:= \left\{ ([k-j]_N,[k+j]_N): j=0,\ldots ,n-1\right\} , \quad k=1,\ldots ,N. \end{aligned}$$
  • For \(N = 2n\), \(n\in \mathbb {N}\), define the k-th set by
    $$\begin{aligned} [k]_{N}^{\text {even}}&:= [k]_{N-1}^{\text {odd}} \setminus \{(k,k)\} \cup \{(k,N)\}, \quad k=1,\ldots ,N-1, \\ [N]_{N}^{\text {even}}&:= \{(j,j): j=1,\ldots ,N\}. \end{aligned}$$
Here, \([p]_N\in \{1,\ldots ,N\}\) denotes \(p \mod N\). For the convenience, we define \([N]_N = N\) instead of \([N]_N = 0\). By the definition of the sets, it is clear that for \(N=2n-1\), \(n\in \mathbb {N}\), each set includes n pairs, and hence, 2n numbers are referred in total. In order to achieve our target, we need to verify the properties in the previous paragraph. In other words, we prove the following theorem.
Theorem 1
Let \(N=2n-1\), \(n\in \mathbb {N}\).
\(\mathrm {(i)}\) In each \([k]_{N}^{\textrm{odd}}\), \(k=1,\ldots ,N\), except for the pair (kk), any number in \(\{1,\ldots ,N\}\setminus \{k\}\) appears once and only once. This implies that different elements in \([k]_{N}^{\textrm{odd}}\) do not share a common number.
\(\mathrm {(ii)}\) For \(k, k^\prime =1,\ldots ,N\) and \(k\not =k^\prime \), we have \([k]_N^{\textrm{odd}} \cap [k^\prime ]_N^{\textrm{odd}} = \emptyset \).
Proof
\(\mathrm {(i)}\) Note that for each \(k=1,\ldots ,N\), the set \([k]_N^{\text {odd}}\) contains exact n pairs and 2n numbers including k twice. Thus, “any number in \(\{1,\ldots ,N\}\) appears" is equivalent to “any number in \(\{1,\ldots ,N\}\setminus \{k\}\) appears once and only once". Then, it is sufficient to prove that for each \(\ell =1,\ldots ,N\), and \(k=1,\ldots ,N\), there exists \(j_{k\ell }\in \{0,\ldots ,n-1\}\) such that either \([k-j_{k\ell }]_N = \ell \) or \([k+j_{k\ell }]_N = \ell \). This is verified by the following explicit choice of \(j_{k\ell }\):
$$\begin{aligned} j_{k\ell } = \left\{ \begin{aligned}&k-\ell , & 0\le k-\ell \le n-1, \\&N-(k-\ell ), & n\le k-\ell \le 2n-2, \\&\ell -k, & -n+1\le k-\ell \le -1, \\&N-(\ell -k), & -2n+1 \le k-\ell \le -n. \end{aligned} \right. \end{aligned}$$
\(\mathrm {(ii)}\) We prove by contradiction. Without loss of generality, we assume that there exist two integers \(k>k^\prime \) such that \([k]_N^{\text {odd}} \cap [k^\prime ]_N^{\text {odd}} \not = \emptyset \). According to the definition of the sets, we have either
$$\begin{aligned} \left\{ \begin{aligned}&[k-j]_N = [k^\prime -j^\prime ]_N, \\&[k+j]_N = [k^\prime +j^\prime ]_N, \end{aligned} \right. \quad \text {or }\quad \left\{ \begin{aligned}&[k-j]_N = [k^\prime +j^\prime ]_N, \\&[k+j]_N = [k^\prime -j^\prime ]_N, \end{aligned} \right. \end{aligned}$$
for some integers \(j,j^\prime =0,\ldots ,n-1\). In both cases, we obtain
$$\begin{aligned} [k-j]_N + [k+j]_N = [k^\prime +j^\prime ]_N + [k^\prime -j^\prime ]_N, \end{aligned}$$
that is,
$$\begin{aligned} k-j+k+j + mN = k^\prime -j^\prime +k^\prime +j^\prime + m^\prime N, \end{aligned}$$
equivalently,
$$\begin{aligned} 2(k-k^\prime ) = (m^\prime -m) N, \end{aligned}$$
for some \(m,m^\prime \in \mathbb {Z}\). Since \(N=2n-1\) is an odd number, we conclude that \(m^\prime -m\) is even, and hence there exists \({\tilde{m}}\in \mathbb {Z}\) such that \(k-k^\prime ={\tilde{m}} N\). This implies \(k=k^\prime \), which is a contradiction. \(\square \)
On the other hand, for \(N=2n\), \(n\in \mathbb {N}\), by the definition of the sets, it is readily to see that each set (except the last one) includes n pairs, and hence, 2n numbers are referred in total, and the last one includes N pairs: \(\{(j,j)\}_{j=1,\ldots ,N}\). We employ the above theorem to prove the following theorem for an even number.
Theorem 2
Let \(N=2n\), \(n\in \mathbb {N}\).
\(\mathrm {(i)}\) In each \([k]_{N}^{\textrm{even}}\), \(k=1,\ldots ,N-1\), any number in \(\{1,\ldots ,N\}\) appears once and only once. This implies that different elements in \([k]_{N}^{\textrm{even}}\) do not share a common number.
\(\mathrm {(ii)}\) For \(k, k^\prime =1,\ldots ,N\) and \(k\not =k^\prime \), we have \([k]_N^{\textrm{even}} \cap [k^\prime ]_N^{\textrm{even}} = \emptyset \).
Proof
\(\mathrm {(i)}\) In \([N]_{N}^{\text {even}}\), different elements do not share a common number by the definition. For \(k=1,\ldots ,N-1\), by Theorem 1(i), any number in \(\{1,\ldots ,N-1\}\setminus \{k\}\) appears once and only once in \([k]_{N-1}^{\text {odd}}\setminus \{(k,k)\}\). Therefore, we find that any number in \(\{1,\ldots ,N\}\) appears once and only once in \([k]_{N}^{\text {even}}\) by the definition.
\(\mathrm {(ii)}\) If \(k=N\) (or \(k^\prime =N\)), then it is trivial because \([k^\prime ]_N^{\text {even}}\) (or \([k]_N^{\text {even}}\)) has no pairs with the same number for any \(k^\prime =1,\ldots ,N-1\). For \(1\le k^\prime \not =k\le N-1\), by Theorem 1(ii), we have \([k]_{N-1}^{\text {odd}} \cap [k^\prime ]_{N-1}^{\text {odd}} = \emptyset \). By the definition of \([k]_N^{\text {even}}\), it is sufficient to check \((k,N)\notin [k^\prime ]_{N-1}^{\text {odd}}\) for any \(1\le k^\prime \not = k\le N-1\), and \((k,N)\not =(k^\prime ,N)\). These are trivial because there are no pairs with number N in \([k^\prime ]_{N-1}^{\text {odd}}\), and \(k\not =k^\prime \). \(\square \)
We show the structure using our constructions with linear depth for \(N_e=5\) and \(N_e=6\) in Fig. 17.
Fig. 17
Realization of linear depth for two-particle interaction potential using illustrative examples of \(N_e=5,6\). The gate operations surrounded by the dashed lines can be executed in parallel
Full size image

Implementation of distance register

Recall that \(\Delta x_i = L_i/2^n\), \(i=1,\ldots ,d\). For simplicity, we choose the same value for each dimension: \(L_i=L\) and \(\Delta x = L/2^n\). The distance in one dimension is proportional to the difference between two integers, which can be efficiently realized by a quantum comparator, a Fredkin gate, and an in-place modular subtractor [18, Fig. 7]. Thus, we only consider the case of \(d\ge 2\) in the following contexts.
Electron–nucleus distance We assume that the known position vector of the v-th nucleus is expressed by
$$\begin{aligned} \varvec{R}_v = \tilde{\varvec{k}}_v \Delta x = \sum _{i=1}^d \Delta x{\tilde{k}}_{v,i} \varvec{e}_i, \end{aligned}$$
where \(\tilde{\varvec{k}}_v = ({\tilde{k}}_{v,1}, \ldots , {\tilde{k}}_{v,d})^{\textrm{T}}\), \(v=1,\ldots , N_{\text {nuc}}\) is an integer-valued vector. Then, the operation \(U_{\text {dis},v}(\varvec{R}_v)\) aims at
$$\begin{aligned} U_{\text {dis},v}(\varvec{R}_v) \left( {|{\varvec{k}_\ell }\rangle }_{dn} \otimes {|{0}\rangle }_{n_{\text {anc}}}\right) = {|{r_{\ell }(\varvec{R}_v)}\rangle }_{n_{\text {dis}}} \otimes {|{\varvec{q}_{\ell ,v}}\rangle }_{dn+n_{\text {anc}}-n_{\text {dis}}}, \end{aligned}$$
(C7)
for \(\ell =1,\ldots , N_{\text {e}}\). Here, \({|{r_{\ell }(\varvec{R}_v)}\rangle }\) is the electron–nucleus distance register with \(r_{\ell }(\varvec{R}_v):= \sum _{i=1}^d (k_{\ell ,i}-{\tilde{k}}_{v,i})^2\). To be precise, this should be called a squared distance register because we have
$$\begin{aligned} r_{\ell }(\varvec{R}_v) = |\varvec{r}_\ell - \varvec{R}_v|^2/(\Delta x)^2. \end{aligned}$$
In some papers, the distance register is defined as \({|{\sqrt{|\varvec{r}_\ell - \varvec{R}_v|^2}}\rangle }\) instead, but such a non-integer register needs more ancillary qubits and extra discussion on the error, which we try to avoid in this paper. A non-integer distance register is compatible with the method APK, and we will discuss it in detail in future works. In Eq. (C7), \({|{\varvec{q}_{\ell ,v}}\rangle }\) is a quantum state depending on \(\varvec{k}_\ell \) and \(\tilde{\varvec{k}}_v\), and \(n_{\text {anc}}\) is the number of ancillary qubits to achieve such an operation. Moreover, since \(r_{\ell }(\varvec{R}_v)\) takes the value of a non-negative integer smaller than \(d(N-1)^2\), it is enough to take \(n_{\text {dis}} = 2n +\lceil \log _2 d\rceil \).
Electron–electron distance An electron–electron distance register \({|{r_{\ell ,\ell ^\prime }}\rangle }\) is similarly defined using \(r_{\ell ,\ell ^\prime }:= \sum _{i=1}^d (k_{\ell ,i}-k_{\ell ^\prime ,i})^2\), and \(U_{\text {dis}}\) is an operation aiming at
$$\begin{aligned} U_{\text {dis}} \left( {|{\varvec{k}_{\ell ^\prime }}\rangle }_{dn} \otimes {|{\varvec{k}_{\ell }}\rangle }_{dn} \otimes {|{0}\rangle }_{n_{\text {anc}}}\right) = {|{r_{\ell ,\ell ^\prime }}\rangle }_{n_{\text {dis}}} \otimes {|{\varvec{q}_{\ell ,\ell ^\prime }}\rangle }_{2dn+n_{\text {anc}}-n_{\text {dis}}}. \end{aligned}$$
The operations \(U_{\text {dis},v}(\varvec{R}_v)\), and \(U_{\text {dis}}\) can be realized by arithmetic operations, including absolute subtraction, square, and addition, as shown in Fig. 18. For each \(i=1,\ldots ,d\), the absolute subtractions \(U_{\text {sub},v}^{(i)}\) and \(U_{\text {sub}}\) can be constructed by the circuits in Fig. 19, and the square operation \(U_{\text {sqr}}\) is given in Fig. 20.
Fig. 18
Implementations of \(U_{\text {dis},v}(\varvec{R}_v)\), and \(U_{\text {dis}}\) by using arithmetic operations for \(d=3\). Here, \(U_{\text {sub},v}^{(i)}\) and \(U_{\text {sub}}\) are quantum absolute value subtractors, \(U_{\text {sqr}}\) is a quantum circuit of squaring, and \(A_k\) denotes a quantum adder of two k-qubit registers with a carry qubit in the center
Full size image
Fig. 19
Implementations of \(U_{\text {sub},v}^{(i)}\), and \(U_{\text {sub}}\) by modular subtractors \(U_{\text {msub}}({\tilde{k}}_{v,i})\), \(U_{\text {msub}}\), CNOT gates, and a controlled increment gate. Here, the CNOT gate denotes n CNOT gates controlled by the ancillary qubit and targeted at the n qubits in the first register
Full size image
Fig. 20
Implementation of \(U_{\text {sqr}}\) based on schoolbook method in an example of \(n=4\). Here, \(A_k\) denotes the quantum adder used in Fig. 18
Full size image
For the implementation in Fig. 18, we have \(n_{\text {anc}} = d+2nd+d(d-1)/2 = 2dn + d(d+1)/2\). Moreover, if we employ the (controlled) modular adder/subtractor in [33], the modular subtractor with a given integer in [34], and the controlled increment gate in [34], then the gate count is \(\mathcal {O}(dn^2)\) with depth \(\mathcal {O}(n^2+dn)\). In the case of \(d=2\), instead of Fig. 18, one can alternatively employ a subtractor with a reversible squaring and sum-of-squares unit in [50] using more ancillary qubits of order \(\mathcal {O}(n^2)\).
On the other hand, as we try to minimize the number of ancillary qubits, we provide alternative implementations of operations \(U_{\text {dis},v}(\varvec{R}_v)\), and \(U_{\text {dis}}\) using QFTs and controlled polynomial phase gates. The previous papers [34, 5153] applied QFTs and phase gates to construct the quantum adder, multiplier, etc. Intrigued by these papers, we use a similar idea and propose circuits deriving a second-degree polynomial in the register, as shown in Fig. 21.
Fig. 21
Alternative implementations of \(U_{\text {dis},v}(\varvec{R}_v)\), and \(U_{\text {dis}}\) using QFTs and controlled polynomial phase gates. Here, \(U_{\text {ph}}\) denotes the polynomial phase gate defined in Appendix A.3. The number of zero-initialized ancillary qubits is \(2n+\lceil \log _2 d\rceil \)
Full size image
These circuits use only \(n_{\text {anc}}=n_{\text {dis}}=2n+\lceil \log _2 d\rceil \) ancillary qubits, which is smaller than those in Fig. 18. We verify the above circuit for \(U_{\text {dis},v}(\varvec{R}_v)\) by the definition of the polynomial phase gate. For simplicity, we denote \(n_{\text {dis}}\) by m and obtain
$$\begin{aligned}&{|{0}\rangle }^{\otimes m} \otimes {|{\varvec{k}_\ell }\rangle }_{dn} \xrightarrow {U_{\text {QFT}}} \frac{1}{\sqrt{2^{m}}} \sum _{q_0,\ldots ,q_{m-1}=0}^1 {|{q_{m-1}}\rangle } \otimes \cdots \otimes {|{q_0}\rangle } \otimes {|{\varvec{k}_\ell }\rangle }_{dn} \\&\quad \xrightarrow {\text {C}U_{\text {ph}}} \frac{1}{\sqrt{2^{m}}} \sum _{q_0,\ldots ,q_{m-1}=0}^1 \textrm{exp}\left( \textrm{i}2\pi q_0 \sum _{i=1}^d (k_{\ell ,i}-{\tilde{k}}_{v,i})^2/2^m\right) {|{q_{m-1}}\rangle } \otimes \cdots \otimes {|{q_0}\rangle } \otimes {|{\varvec{k}_\ell }\rangle }_{dn} \\&\quad \xrightarrow {\text {C}U_{\text {ph}}} \frac{1}{\sqrt{2^{m}}} \sum _{q_0,\ldots ,q_{m-1}=0}^1 \textrm{exp}\left( \textrm{i}2\pi (q_0+2q_1) \sum _{i=1}^d (k_{\ell ,i}-{\tilde{k}}_{v,i})^2/2^m\right) {|{q_{m-1}}\rangle } \otimes \cdots \otimes {|{q_0}\rangle } \otimes {|{\varvec{k}_\ell }\rangle }_{dn} \\&\quad \xrightarrow {\text {C}U_{\text {ph}}} \cdots \xrightarrow {\text {C}U_{\text {ph}}} \frac{1}{\sqrt{2^{m}}} \sum _{q=0}^{2^m-1} \textrm{exp}\left( \textrm{i}2\pi q \sum _{i=1}^d (k_{\ell ,i}-{\tilde{k}}_{v,i})^2/2^m\right) {|{q}\rangle }_m \otimes {|{\varvec{k}_\ell }\rangle }_{dn} \\&\quad \xrightarrow {U_{\text {QFT}}^\dag } \frac{1}{2^{m}} \sum _{q,q^\prime =0}^{2^m-1} \textrm{exp}\left( \textrm{i}2\pi q \left( \sum _{i=1}^d (k_{\ell ,i}-{\tilde{k}}_{v,i})^2-q^\prime \right) /2^m\right) {|{q^\prime }\rangle }_m \otimes {|{\varvec{k}_\ell }\rangle }_{dn} \\&\quad = {|{\sum _{i=1}^d (k_{\ell ,i}-{\tilde{k}}_{v,i})^2}\rangle }_m \otimes {|{\varvec{k}_\ell }\rangle }_{dn}. \end{aligned}$$
The last equation follows from the identity \(\sum _{q=0}^{2^m-1} \textrm{exp}\left( \textrm{i}2\pi q {\tilde{q}}/2^m\right) = 2^m \delta _{{\tilde{q}},0}\) provided that \({\tilde{q}}\) is an integer in \([-2^m+1,2^m-1]\). The verification for \(U_{\text {dis}}\) is similar.
Now, we discuss the detailed gate count and the circuit depth of Fig. 21. We note that the polynomial phase gate \(U_{\text {ph}}[-2\pi r_{\ell }(\varvec{R}_v)/2^m] = U_{\text {ph}}\left[ -2\pi \sum _{i=1}^d (k_{\ell ,i}-{\tilde{k}}_{v,i})^2/2^m\right] \) can be implemented by d times phase gates for quadratic functions \(-2\pi (x_i-{\tilde{k}}_{v,i})^2/2^m\) on n qubits, and \(U_{\text {QFT}}\) on a zero register is equivalent to the Hadamard gates. Thus, \(U_{\text {dis},v}(\varvec{R}_v)\) uses a global phase gate, \(4n+2\lceil \log _2 d\rceil \) Hadamard gates, \(dn(2n+\lceil \log _2 d\rceil )+(2n+\lceil \log _2 d\rceil )(2n+\lceil \log _2 d\rceil -1)/2\) controlled phase gates, and \(dn(n-1)(2n+\lceil \log _2 d\rceil )/2\) 2-controlled phase gates with depth \(1+8(2n+\lceil \log _2 d\rceil )-10+n(6n+2)\max \{d,2n+\lceil \log _2 d\rceil \}\). Here, the gate count and the depth of Fig. 21 are both \(\mathcal {O}(n^3)\), one order higher than the circuit in Fig. 18 due to the limited number of ancillary qubits. As for the operation \(U_{\text {dis}}\), if we regard \({|{k_{\ell ^\prime ,i}}\rangle }_n \otimes {|{k_{\ell ,i}}\rangle }_n\) as a new register \({|{{\hat{k}}_{\ell ,\ell ^\prime ,i}}\rangle }_{2n}\), then the polynomial phase gate \(U_{\text {ph}}[-2\pi r_{\ell ,\ell ^\prime }/2^m]\) can also be implemented by d times phase gates for quadratic functions on 2n qubits. Thus, \(U_{\text {dis}}\) uses a global phase gate, \(4n+2\lceil \log _2 d\rceil \) Hadamard gates, \(2dn(2n+\lceil \log _2 d\rceil )+(2n+\lceil \log _2 d\rceil )(2n+\lceil \log _2 d\rceil -1)/2\) controlled phase gates, and \(dn(2n-1)(2n+\lceil \log _2 d\rceil )\) 2-controlled phase gates with depth \(1+8(2n+\lceil \log _2 d\rceil )-10+2n(12n+2)\max \{d,2n+\lceil \log _2 d\rceil \}\).
For the implementation of distance register, there is a trade-off between the size of the quantum circuit and its depth. The circuit in Fig. 18 is shallower than the circuit in Fig. 21, while the latter one seems less entangled (i.e., requires much less qubits).
1.
2.
go back to reference Abrams, D.S., Lloyd, S.: Simulation of many-body Fermi systems on a universal quantum computer. Phys. Rev. Lett. 79, 2586 (1997). https://​doi.​org/​10.​1103/​PhysRevLett.​79.​2586ADSCrossRefMATH
3.
go back to reference Zalka, C.: Simulating quantum systems on a quantum computer. Proc. R. Soc. Lond. A 454, 313–322 (1998). https://​doi.​org/​10.​1098/​rspa.​1998.​0162ADSCrossRefMATH
4.
go back to reference Wiesner, S.: Simulations of Many-Body Quantum Systems by a Quantum Computer. Preprint arXiv:​quant-ph/​9603028. https://​doi.​org/​10.​48550/​arXiv.​quant-ph/​9603028
5.
go back to reference Kassal, I., Jordan, S.P., Love, P.J., Mohseni, M., Aspuru-Guzik, A.: Polynomial-time quantum algorithm for the simulation of chemical dynamics. PNAS 105(48), 18681–18686 (2008). https://​doi.​org/​10.​1073/​pnas.​0808245105ADSCrossRefMATH
6.
go back to reference Gingrich, R.M., Williams, C.P.: Non-unitary probabilistic quantum computing. In: WISICT ’04: Proceedings of the Winter International Synposium on Information and Communication Technologies, pp. 1–6 (2004). https://​doi.​org/​10.​5555/​984720.​984757
7.
go back to reference Terashima, H., Ueda, M.: Nonunitary quantum circuit. Int. J. Quantum Inf. 3(4), 633–647 (2005). https://​doi.​org/​10.​1142/​S021974990500145​6CrossRefMATH
8.
go back to reference McArdle, S., Jones, T., Endo, S., Li, Y., Benjamin, S.C., Yuan, X.: Variational ansatz-based quantum simulation of imaginary time evolution. npj Quantum Inf. 5, 75 (2019). https://​doi.​org/​10.​1038/​s41534-019-0187-2ADSCrossRefMATH
9.
go back to reference Motta, M., Sun, C., Tan, A.T.K., O’Rourke, M.J., Ye, E., Minnich, A.J., Brandão, F.G.S.L., Chan, G.K.-L.: Determining eigenstates and thermal states on a quantum computer using quantum imaginary time evolution. Nat. Phys. 16, 205–210 (2020). https://​doi.​org/​10.​1038/​s41567-019-0704-4CrossRef
10.
go back to reference Kosugi, T., Nishiya, Y., Nishi, H., Matsushita, Y.: Imaginary-time evolution using forward and backward real-time evolution with a single ancilla: first-quantized eigensolver algorithm for quantum chemistry. Phys. Rev. Res. 4, 033121 (2022). https://​doi.​org/​10.​1103/​PhysRevResearch.​4.​033121CrossRef
11.
go back to reference Schuch, N.: Implementation of quantum algorithms with Josephson charge qubits. Ph.D. Thesis Universität Regensburg (2002)
12.
go back to reference Schuch, N., Siewert, J.: Programmable networks for quantum algorithms. Phys. Rev. Lett. 91, 027902 (2003). https://​doi.​org/​10.​1103/​PhysRevLett.​91.​027902ADSCrossRefMATH
13.
go back to reference Bullock, S.S., Markov, I.L.: Asymptotically optimal circuits for arbitrary n-qubit diagonal computations. Quantum Inf. Comput. 4(1), 27–47 (2004). https://​doi.​org/​10.​5555/​2011572.​2011575MathSciNetCrossRefMATH
14.
go back to reference Welch, J., Greenbaum, D., Mostame, S., Aspuru-Guzik, A.: Efficient quantum circuits for diagonal unitaries without ancillas. New J. Phys. 16, 033040 (2014). https://​doi.​org/​10.​1088/​1367-2630/​16/​3/​033040ADSCrossRefMATH
15.
go back to reference Gray, F.: Pulse code communication. United States Patent Number 2632058 (1953). https://​patents.​google.​com/​patent/​US2632058A/​en
16.
go back to reference Beauchamp, K.G.: Applications of Walsh and Related Functions With an Introduction to Sequency Theory. Academic Press, New York (1984)MATH
17.
go back to reference Zhang, S., Huang, K., Li, L.: Depth-optimized quantum circuit synthesis for diagonal unitary operators with asymptotically optimal gate count. Phys. Rev. A 109, 042601 (2024). https://​doi.​org/​10.​1103/​PhysRevA.​109.​042601ADSMathSciNetCrossRef
18.
go back to reference Huang, X., Kosugi, T., Nishi, H., Matsushita, Y.: Optimized synthesis of circuits for diagonal unitary matrices with reflection symmetry. J. Phys. Soc. Jpn. 93, 054002 (2024). https://​doi.​org/​10.​7566/​JPSJ.​93.​054002ADSCrossRefMATH
19.
go back to reference Beer, K., Dziemba, F.A.: Phase-context decomposition of diagonal unitaries for higher-dimensional systems. Phys. Rev. A 93, 052333 (2016). https://​doi.​org/​10.​1103/​PhysRevA.​93.​052333ADSCrossRef
20.
go back to reference Welch, J., Bocharov, A., Svore, K.M.: Efficient Approximation of diagonal unitaries over the Clifford\(+\)T basis. Quantum Inf. Comput. 16(1–2), 87–104 (2016). https://​doi.​org/​10.​5555/​3179320.​3179326
21.
go back to reference Benenti, G., Strini, G.: Quantum simulation of the single-particle Schrödinger equation. Am. J. Phys. 76, 657–662 (2008). https://​doi.​org/​10.​1119/​1.​2894532ADSCrossRefMATH
22.
go back to reference Ollitrault, P.J., Mazzola, G., Tavernelli, I.: Nonadiabatic molecular quantum dynamics with quantum computers. Phys. Rev. Lett. 125, 260511 (2020). https://​doi.​org/​10.​1103/​PhysRevLett.​125.​260511ADSCrossRefMATH
23.
go back to reference Kosugi, T., Nishi, H., Matsushita, Y.: Exhaustive search for optimal molecular geometries using imaginary-time evolution on a quantum computer. npj Quantum Inf. 9, 112 (2023). https://​doi.​org/​10.​1038/​s41534-023-00778-ADSCrossRefMATH
24.
go back to reference Jones, N.C., Whitfield, J.D., McMahon, P.L., Yung, M.-H., Meter, R.V., Aspuru-Guzik, A., Yamamoto, Y.: Faster quantum chemistry simulation on fault-tolerant quantum computers. New J. Phys. 14, 115023 (2012). https://​doi.​org/​10.​1088/​1367-2630/​14/​11/​115023CrossRefMATH
25.
go back to reference Babbush, R., Gidney, C., Berry, D.W., Wiebe, N., McClean, J., Paler, A., Fowler, A., Neven, H.: Encoding electronic spectra in quantum circuits with linear T complexity. Phys. Rev. X 8(4), 041015 (2018). https://​doi.​org/​10.​1103/​PhysRevX.​8.​041015CrossRefMATH
26.
go back to reference Häner, T., Roetteler, M., Svore, K.M.: Optimizing Quantum Circuits for Arithmetic. Preprint. https://​doi.​org/​10.​48550/​arXiv.​1805.​12445. arXiv:​1805.​12445
27.
go back to reference Sanders, Y.R., Berry, D.W., Costa, P.C.S., Tessler, L.W., Wiebe, N., Gidney, C., Neven, H., Babbush, R.: Compilation of fault-tolerant quantum heuristics for combinatorial optimization. PRX Quantum 1(2), 020312 (2020). https://​doi.​org/​10.​1103/​PRXQuantum.​1.​020312CrossRef
28.
go back to reference Sornborger, A.: Quantum simulation of tunneling in small systems. Sci. Rep. 2, 597 (2012). https://​doi.​org/​10.​1038/​srep00597CrossRefMATH
29.
go back to reference Sheridan, L., Maslov, D., Mosca, M.: Approximating fractional time quantum evolution. J. Phys. A Math. Theor. 42(18), 185302 (2009). https://​doi.​org/​10.​1088/​1751-8113/​42/​18/​185302ADSMathSciNetCrossRefMATH
30.
go back to reference Ramos-Calderer, S.: Efficient quantum interpolation of natural data. Phys. Rev. A 106, 062427 (2022). https://​doi.​org/​10.​1103/​PhysRevA.​106.​062427ADSMathSciNetCrossRefMATH
31.
go back to reference Moosa, M., Watts, T.W., Chen, Y., Sarma, A., McMahon, P.L.: Linear-depth quantum circuits for loading Fourier approximations of arbitrary functions. Quantum Sci. Technol. 9(1), 015002 (2024). https://​doi.​org/​10.​1088/​2058-9565/​acfc62ADSCrossRef
32.
go back to reference Berry, D.W., Kieferová, M., Scherer, A., Sanders, Y.R., Low, G.H., Wiebe, N., Gidney, C., Babbush, R.: Improved techniques for preparing eigenstates of fermionic Hamiltonians. npj Quantum Inf. 4(1), 22 (2018). https://​doi.​org/​10.​1038/​s41534-018-0071-5ADSCrossRefMATH
33.
go back to reference Li, H., Fan, P., Xia, H., Peng, H., Long, G.: Efficient quantum arithmetic operation circuits for quantum image processing. Sci. China Phys. Mech. Astron. 63(8), 280311 (2020). https://​doi.​org/​10.​1007/​s11433-020-1582-8ADSCrossRefMATH
34.
go back to reference Yuan, Y., Wang, C., Wang, B., Chen, Z., Dou, M., Wu, Y., Guo, G.: An improved QFT-based quantum comparator and extended modular arithmetic using one ancilla qubit. New J. Phys. 25, 103011 (2023). https://​doi.​org/​10.​1088/​1367-2630/​acfd52ADSMathSciNetCrossRefMATH
35.
go back to reference Agarwal, R.P., Wong, P.J.Y.: Optimal error bounds for the derivatives of two point Hermite interpolation. Comput. Math. Appl. 21(8), 21–35 (1991). https://​doi.​org/​10.​1016/​0898-1221(91)90048-9MathSciNetCrossRefMATH
36.
go back to reference Silva, A.J., Park, D.K.: Linear-depth quantum circuits for multiqubit controlled gates. Phys. Rev. A 106, 042602 (2022). https://​doi.​org/​10.​1103/​PhysRevA.​106.​042602ADSMathSciNetCrossRef
37.
go back to reference Childs, A.M., Leng, J., Li, T., Liu, J.-P., Zhang, C.: Quantum simulation of real-space dynamics. Quantum 6, 860 (2022). https://​doi.​org/​10.​22331/​q-2022-11-17-860CrossRefMATH
38.
go back to reference Chan, H.H.S., Meister, R., Jones, T., Tew, D.P., Benjamin, S.C.: Grid-based methods for chemistry simulations on a quantum computer. Sci. Adv. 9, eabo7484 (2023). https://​doi.​org/​10.​1126/​sciadv.​abo7484CrossRefMATH
39.
go back to reference Jahnke, T., Lubich, C.: Error bounds for exponential operator splittings. BIT Numer. Math. 40(4), 735–744 (2000). https://​doi.​org/​10.​1023/​A:​1022396519656MathSciNetCrossRefMATH
40.
go back to reference Nielsen, M.A., Chuang, I.L.: Quantum Computation and Quantum Information, 10th edn. Cambridge University Press, Cambridge (2010). https://​doi.​org/​10.​1017/​CBO9780511976667​CrossRefMATH
41.
go back to reference Barenco, A., Bennett, C.H., Cleve, R., DiVincenzo, D.P., Margolus, N., Shor, P., Sleator, T., Smolin, J.A., Weinfurter, H.: Elementary gates for quantum computation. Phys. Rev. A 52, 3457 (1995). https://​doi.​org/​10.​1103/​PhysRevA.​52.​3457ADSCrossRefMATH
42.
go back to reference Gidney, C.: Constructing Large Increment Gates. http://​cs.​stackexchange.​com/​questions/​40933/​ (2015). Accessed 28 Oct 2024
43.
go back to reference Hall, C.A., Meyer, W.W.: Optimal error bounds for cubic spline interpolation. J. Approx. Theory 16, 105–122 (1976). https://​doi.​org/​10.​1088/​1367-2630/​14/​11/​115023MathSciNetCrossRefMATH
44.
go back to reference Gidney, C.: Efficient controlled phase gradients. https://​algassert.​com/​post/​1708 (2017). Accessed 28 Oct 2024
45.
go back to reference Amy, M., Azimzadeh, P., Mosca, M.: On the controlled-NOT complexity of controlled-NOT-phase circuits. Quantum Sci. Technol. 4, 015002 (2019). https://​doi.​org/​10.​1088/​2058-9565/​aad8caADSCrossRefMATH
46.
go back to reference Dubeau, F., Savoie, J.: Optimal error bounds for quadratic spline interpolation. J. Math. Anal. Appl. 198(1), 49–63 (1996). https://​doi.​org/​10.​1006/​jmaa.​1996.​0067MathSciNetCrossRefMATH
47.
go back to reference Häner, T., Roetteler, M., Svore, K.M.: Factoring using \(2n+2\) qubits with Toffoli based modular multiplication. Quantum Inf. Comput. 17(7–8), 673–684 (2017). https://​doi.​org/​10.​5555/​3179553.​3179560
48.
go back to reference Takahashi, Y., Tani, S., Kunihiro, N.: Quantum addition circuits and unbounded fan-out. Quantum Inf. Comput. 10(9), 872–890 (2010). https://​doi.​org/​10.​5555/​2011464.​2011476MathSciNetCrossRefMATH
49.
go back to reference Qiskit contributors. Qiskit: An Open-source Framework for Quantum Computing (2023). https://​doi.​org/​10.​5281/​zenodo.​2573505
50.
go back to reference Nagamani, A.N., Ramesh, C., Agrawal, V.K.: Design of optimized reversible squaring and sum-of-squares units. Circuits Syst. Signal Process 37, 1753–1776 (2018). https://​doi.​org/​10.​1007/​s00034-017-0631-5MathSciNetCrossRefMATH
52.
go back to reference Ruiz-Perez, L., Garcia-Escartin, J.C.: Quantum arithmetic with the quantum Fourier transform. Quantum Inf. Process. 16, 152 (2017). https://​doi.​org/​10.​1007/​s11128-017-1603-1ADSMathSciNetCrossRefMATH
53.
go back to reference Şahin, E.: Quantum arithmetic operations based on quantum Fourier transform on signed integers. Int. J. Quantum Inf. 18(6), 2050035 (2020). https://​doi.​org/​10.​1142/​S021974992050035​5MathSciNetCrossRefMATH