A quantum speedup in machine learning: finding an N-bit Boolean function for a classification

Seokwon Yoo; Jeongho Bang; Changhyoup Lee; Jinhyoung Lee

doi:10.1088/1367-2630/16/10/103014

1. Introduction

Over the past few decades, quantum physics has brought remarkable innovations into fields of various disciplines. For example, there are exponentially faster quantum algorithms, compared to their classical counterparts [1–3]. The physical limit of measurement precision has been improved in quantum metrology [4, 5], and a large number of protocols offering higher security have been proposed in quantum cryptography [6, 7]. These achievements are enabled by appropriate usage of quantum effects such as quantum superposition and quantum entanglement.

Another important scientific area is machine learning, which is a subfield of artificial intelligence and one of the most advanced automatic control techniques. While learning is usually regarded as a characteristic of humans or living things, machine learning enables a machine to learn a task [8]. Machine learning has been attracting great attention, with its novel ability to learn. On one hand, machine learning has been studied to provide an understanding of the learning of a real biological system, in a theoretical manner. On the other hand, machine learning is also expected to provide reliable control techniques for use in designing complex systems in a practical manner [8].

Recently, the hybridizing of the two scientific fields described above, quantum technology and machine learning, has received great interest [9–12]. One question naturally arises: can machine learning be improved by using favorable quantum effects? Several attempts to answer this question have been made in the past few years—for example, using quantum perceptrons [13], neural networks [14–16], and quantum-inspired evolutionary algorithms [17, 18]. Most recently, remarkable studies have been carried out [19–22]. In [19], a learning speedup for the quantum machine was observed with a lower memory requirement for a specific example, namely the kth-root NOT operation. In [20], a strategy for designing a quantum algorithm was introduced, establishing a link between the learning speedup and the speedup of the quantum algorithm found. In [21, 22], the authors showed quantum speedup for the task of classifying a large number of data. However, it is still unclear what quantum effects work in machine learning and how they work, particularly in the absence of a fair comparison between classical and quantum machines.

In this work, we consider a binary classification problem as a learning task. Such a classification can be realized for an N-bit Boolean function that maps a set of N-bit binary strings in ${{\{0,1\}}^{N}}$ into $\{0,1\}$ [23]. The main objective in this paper is to compare a quantum machine with a classical machine. These two machines are equivalent. The only differentiation is that the quantum machine can deal with quantum effects, whereas the classical machine cannot. The machines are analyzed in terms of the acceptable region, defined as a localized solution region of parameter space. In the analysis, it is shown that the quantum machine can learn faster due to the acceptable region being expanded by quantum superposition. Such a quantum learning speedup is understood in terms of an expansion of the acceptable region. In order to make the analysis more explicit, we analyze further by using random search, which is a standard model for use in learning performance analysis [24]. In such a primitive model, we validate the quantum speedup, showing that the overall number of iterations required to complete the learning is proportional to ${\rm O}({{{\rm e}}^{\alpha D}})$ , with $\alpha \simeq 3.065$ for the classical machine and $\alpha \simeq 0.238$ for the quantum machine. Here, D is the size of the search space. Differential evolution is employed as a learning model, taking into account more realistic circumstances. By means of numerical simulations, we show that the quantum speedup is still observed even in such a case.

2. Classical and quantum machines

Machine learning can be decomposed into two parts: the machine and the feedback. The machine performs various tasks depending on its internal parameters, and the feedback adjusts the parameters of the machine in order for the machine to perform a required task called the target. Learning is a process involving finding suitable parameters for the machine, whereby the machine is expected to generate desired results working towards a target⁴ . This concept of machine learning has been widely adopted in the context of machine learning at the fundamental level [8].

In this work, we assign to a machine a binary classification problem as a task, where the machine will learn a target N-bit Boolean function, defined as

$\begin{eqnarray}&&f:{\boldsymbol{x}} \in {{\{0,1\}}^{N}}\to y\in \{0,1\},\end{eqnarray} \tag{ 1 }$

where ${\boldsymbol{x}} ={{x}_{N}}...{{x}_{2}}{{x}_{1}}$ is represented as an N-bit string of ${{x}_{j}}\in \{0,1\}$ ( $j=1,2,\ldots ,N$ ). This function can be written by using the positive polarity Reed–Muller expansion [25]:

$\begin{eqnarray}&&f({\boldsymbol{x}} )={{a}_{0}}\oplus {{a}_{1}}{{x}_{1}}\oplus {{a}_{2}}{{x}_{2}}\oplus {{a}_{3}}{{x}_{1}}{{x}_{2}}\oplus \cdots \oplus {{a}_{{{2}^{N}}-1}}{{x}_{1}}\cdots {{x}_{N}}=\mathop{\oplus }\limits_{k=0}^{{{2}^{N}}-1}\left( {{a}_{k}}\mathop{\prod }\limits_{j\in {{{\rm c}}_{k}}}{{x}_{j}} \right),\end{eqnarray} \tag{ 2 }$

where ⊕ denotes modulo-2 addition, ⊕ means a direct sum of the moduli, and the Reed–Muller coefficients a_k are either 0 or 1. Here, ${{{\rm c}}_{k}}$ is an index set whose elements are given in such a way. The number j is then taken to be an element of ${{{\rm c}}_{k}}$ only if k_j is equal to 1 when k is written as an N-bit string, ${{k}_{N}}...{{k}_{2}}{{k}_{1}}$ . Thus, each set of $\{{{a}_{k}}\}$ corresponds to each of ${{2}^{{{2}^{N}}}}$ Boolean functions.

The Boolean function can be implemented by a reversible circuit as shown in figure 1, where an additional bit channel, called the work channel, and controlled operations are employed [26, 27]. A single-bit operation G₀ is placed on the work channel and $({{2}^{N}}-1)$ controlled-G_k operations are caused to act on the work channel when all the control bits, x_j ( $j\in {{{\rm c}}_{k}}$ ), are 1. The input signal c on the work channel is fixed at 0. The operation G_k is given as either the identity (i.e., doing nothing), if a_k = 0, or NOT (i.e., flipping an input bit to its complement bit), if a_k = 1. As an example, a one-bit Boolean function (i.e., with N = 1) has ${{2}^{{{2}^{1}}}}=4$ sets of Reed–Muller coefficients (a₀, a₁), which determine all possible Boolean functions. Table 1 gives four possible one-bit Boolean functions with Reed–Muller coefficients and the corresponding operations.

**Figure 1.** The N-bit Boolean function is implemented by a reversible circuit. The machine consists of N-bit input channels and a single-bit work channel, which contains ${{2}^{N}}$ operations: one single-bit operation G₀, and ${{2}^{N}}-1$ operations G_k conditioned by the input bits ${\boldsymbol{x}}$ . Here, the constant input c is set to be 0, which gives rise to an output bit y.
Download figure:
Standard image High-resolution image

Table 1. Four possible one-bit Boolean functions are given with Reed–Muller coefficients (a₀ and a₁), and operations (G₀ and G₁). These are common to the classical and quantum cases.

Boolean function	a₀	a₁	${{G}_{0}}$	${{G}_{1}}$
${{f}_{1}}:x\mapsto 0$	0	0	Identity	Identity
${{f}_{2}}:x\mapsto 1$	1	0	NOT	Identity
${{f}_{3}}:x\mapsto x$	0	1	Identity	NOT
${{f}_{4}}:x\mapsto x\oplus 1$	1	1	NOT	NOT

With a reversible circuit model, we then define classical and quantum machines. The classical machine consists of classical channels and operations, and the Boolean function of the classical machine is described as

$\begin{eqnarray}&&\left( {\boldsymbol{x}} ,c \right)\mathop{\to }\limits^{f}\left( {\boldsymbol{x}} ,c\oplus y \right)\end{eqnarray} \tag{ 3 }$

with classical bits ${\boldsymbol{x}}$ , y, and c. We suppose the Reed–Muller coefficients a_k to be probabilistically determined by the internal parameters p_k, which implies that G_k performs the identity and NOT operations with probabilities p_k and $1-{{p}_{k}}$ , respectively. These probabilistic operations are primarily intended to provide a fair comparison with the quantum machine, which naturally employs a probabilistic operation. Now, we construct the quantum machine by setting only the work channel to be quantum. The input channels are left as classical, as the input information is classical in our work. Thus, the Boolean function of the quantum machine is described as

$\begin{eqnarray}&&\left( {\boldsymbol{x}} ,|c\rangle \right)\mathop{\to }\limits^{f}\left( {\boldsymbol{x}} ,|\psi \rangle \right),\end{eqnarray} \tag{ 4 }$

where the signal on the work channel is encoded into a qubit state. The classical probabilistic operations G_k are also necessarily replaced by unitary operators:

$\begin{eqnarray}{{\hat{G}}_{k}}=\left( \begin{array}{ccccccccccccccc} \sqrt{{{p}_{k}}} & {{{\rm e}}^{{\rm i}{{\phi }_{k}}}}\sqrt{1-{{p}_{k}}} \\ {{{\rm e}}^{-{\rm i}{{\phi }_{k}}}}\sqrt{1-{{p}_{k}}} & -\sqrt{{{p}_{k}}} \\ \end{array} \right),\end{eqnarray} \tag{ 5 }$

where p_k is the probability of ${{\hat{G}}_{k}}$ performing the identity operation, i.e., $|0\rangle \to |0\rangle$ , $|1\rangle \to {{{\rm e}}^{{\rm i}\pi }}|1\rangle$ , and $1-{{p}_{k}}$ is that of ${{\hat{G}}_{k}}$ performing the NOT operation, i.e., $|0\rangle \to {{{\rm e}}^{-{\rm i}{{\phi }_{k}}}}|1\rangle$ , $|1\rangle \to {{{\rm e}}^{{\rm i}{{\phi }_{k}}}}|0\rangle$ . Note that the relative phases ϕ_k are free parameters suitably chosen before the learning. The feedback adjusts only the p_k parameters, controllable both in the classical and the quantum experimental setups [28, 29].

These classical and quantum machines are equivalent to each other. They have the same circuit structures and exactly the same number of control parameters, p_k. Moreover, the single classical operation G_k and the quantum operator ${{\hat{G}}_{k}}$ cannot be discriminated between by measuring the distribution of outcomes for the same input ${\boldsymbol{x}}$ and parameters p_k.

3. The acceptable region

A target Boolean function is represented by a point, ${{{\rm Q}}_{f}}=({{p}_{0}},{{p}_{1}},\ldots ,{{p}_{{{2}^{N}}-1}})$ , in the ${{2}^{N}}$ -dimensional search space spanned by the probabilities, p_k. For example, four possible learning targets, f_j ( $j=1,2,3,4$ ), for the one-bit Boolean function, correspond to four points in the search space: ${{{\rm Q}}_{{{f}_{1}}}}=(1,1)$ , ${{{\rm Q}}_{{{f}_{2}}}}=(0,1)$ , ${{{\rm Q}}_{{{f}_{3}}}}=(1,0)$ , and ${{{\rm Q}}_{{{f}_{4}}}}=(0,0)$ . Similarly, the machine behavior is also characterized as a point ${{{\rm Q}}_{{\rm m}}}=({{p}_{0}},{{p}_{1}})$ , i.e., the respective points lead to different probabilistic tasks that the machine performs. Learning is simply regarded as a process of moving ${{{\rm Q}}_{{\rm m}}}$ to a given target point in the whole search space. It is, however, usually impractical (actually, impossible in real circumstances) to locate ${{{\rm Q}}_{{\rm m}}}$ exactly at the target point. Instead, it is feasible to find approximate solutions near to the exact target, i.e. the learning is expected to lead the point ${{{\rm Q}}_{{\rm m}}}$ into a region near to the target point [8]. We call such a region an acceptable region for the approximate target functions. As the learning time and convergence depend primarily on the size of the acceptable region, it is usually expected that a larger acceptable region will make the learning faster [30]. We examine, in this sense, the acceptable regions of classical and quantum machines.

The acceptable region is defined as a set of points which guarantee the errors, $\epsilon =1-\mathcal{F}$ , to be less than or equal to a tolerable value, _t. Here, $\mathcal{F}$ is the figure of merit of the machine performance, called the task fidelity, and it quantifies how well the machine performs a target function, defined by

$\begin{eqnarray}&&\mathcal{F}({{p}_{0}},{{p}_{1}},\ldots ,{{p}_{{{2}^{N}}-1}})={{\left( \mathop{\prod }\limits_{{\boldsymbol{x}} }\mathop{\sum }\limits_{y}\sqrt{P(y|{\boldsymbol{x}} ){{P}_{\tau }}(y|{\boldsymbol{x}} )} \right)}^{\frac{1}{{{2}^{N}}}}},\end{eqnarray} \tag{ 6 }$

where $P(y|{\boldsymbol{x}} )$ is a conditional probability of obtaining an output y given an input ${\boldsymbol{x}}$ , and the target probabilities ${{P}_{\tau }}(y|{\boldsymbol{x}} )$ are those for the target. For example, we have target probabilities for f₁ in table 1 written as

$\begin{eqnarray}&&{{P}_{\tau }}(0|0)=1,\;{{P}_{\tau }}(1|0)=0,\;{{P}_{\tau }}(0|1)=1,\;{\rm and}\;{{P}_{\tau }}(1|1)=0.\end{eqnarray} \tag{ 7 }$

The term ${{\sum }_{y}}\sqrt{P(y|{\boldsymbol{x}} ){{P}_{\tau }}(y|{\boldsymbol{x}} )}$ in equation (6) corresponds to the closeness of the two probability distributions $P(y|{\boldsymbol{x}} )$ and ${{P}_{\tau }}(y|{\boldsymbol{x}} )$ for the given ${\boldsymbol{x}}$ [31]. The task fidelity, $\mathcal{F}$ , increases as the outputs get close to the required outputs; $\mathcal{F}$ becomes unity only when the machine gives the target for all ${\boldsymbol{x}}$ , and otherwise is less than 1. The acceptable region can be seen as a set of probabilities, p_k, such that $1-{{\epsilon }_{t}}\leqslant \mathcal{F}({{p}_{1}},\ldots ,{{p}_{{{2}^{N}}-1}})$ , and thus higher $\mathcal{F}$ guarantees a wider acceptable region for a given tolerance, _t.

Let us begin with, as the simplest case, the target function f₁⁵ , a one-bit Boolean function, whose task fidelity, $\mathcal{F}({{p}_{0}},{{p}_{1}})$ , is reduced to

$\begin{eqnarray}&&\mathcal{F}({{p}_{0}},{{p}_{1}})=\sqrt[4]{P(0|0)P(0|1)},\end{eqnarray} \tag{ 8 }$

which is common to the classical and the quantum machines. For the classical machine, equation (8) is evaluated as

$\begin{eqnarray}&&{{\mathcal{F}}_{{\rm c}}}({{p}_{0}},{{p}_{1}})=\sqrt[4]{{{p}_{0}}({{p}_{0}}{{p}_{1}}+{{q}_{0}}{{q}_{1}})},\end{eqnarray} \tag{ 9 }$

adopting the conditional probabilities ${{P}_{{\rm c}}}(y|{\boldsymbol{x}} )$ given by

$\begin{eqnarray}&&{{P}_{{\rm c}}}(0|0)={{p}_{0}}{{p}_{1}}+{{p}_{0}}{{q}_{1}}={{p}_{0}},\;{{P}_{{\rm c}}}(0|1)={{p}_{0}}{{p}_{1}}+{{q}_{0}}{{q}_{1}},\end{eqnarray} \tag{ 10 }$

where ${{q}_{j}}=1-{{p}_{j}}$ (j = 0,1). For the quantum machine, the conditional probabilities ${{P}_{{\rm Q}}}(y|{\boldsymbol{x}} )$ differ slightly from ${{P}_{{\rm c}}}(y|{\boldsymbol{x}} )$ due to the superposition of ${{\hat{G}}_{0}}$ and ${{\hat{G}}_{1}}$ . The conditional probabilities ${{P}_{{\rm Q}}}(y|{\boldsymbol{x}} )$ are given as

$\begin{eqnarray}\begin{array}{rcl} {{P}_{{\rm Q}}}(0|0) & = & |\langle 0|{{{\hat{G}}}_{0}}|0\rangle {{|}^{2}}={{P}_{{\rm c}}}(0|0), \\ {{P}_{{\rm Q}}}(0|1) & = & |\langle 0|{{{\hat{G}}}_{1}}{{{\hat{G}}}_{0}}|0\rangle {{|}^{2}}={{P}_{{\rm c}}}(0|1)+{{p}_{\operatorname{int}}}{\rm cos} \Delta , \\ \end{array}\end{eqnarray} \tag{ 11 }$

where ${{p}_{\operatorname{int}}}=2\sqrt{{{p}_{0}}{{p}_{1}}{{q}_{0}}{{q}_{1}}}$ , and $\Delta ={{\phi }_{1}}-{{\phi }_{0}}$ is the difference of the phases of the two unitaries ${{\hat{G}}_{0}}$ and ${{\hat{G}}_{1}}$ . Thus, the task fidelity ${{\mathcal{F}}_{{\rm Q}}}$ of the quantum machine is evaluated as

$\begin{eqnarray}&&{{\mathcal{F}}_{{\rm Q}}}({{p}_{0}},{{p}_{1}})=\sqrt[4]{\mathcal{F}_{{\rm c}}^{4}+{{p}_{0}}{{p}_{\operatorname{int}}}{\rm cos} \Delta },\end{eqnarray} \tag{ 12 }$

where the additional term ${\rm cos} \Delta$ is apparently the result of quantum superposition. From the result of equation (12), we can see that

$\begin{eqnarray}\left\{ \begin{array}{ccccccccccccccc} {{\mathcal{F}}_{{\rm Q}}}\gt {{\mathcal{F}}_{{\rm c}}} & \;\;{\rm if}\ {\rm cos} \Delta \gt 0, \\ {{\mathcal{F}}_{{\rm Q}}}\lt {{\mathcal{F}}_{{\rm c}}} & \;\;{\rm if}\ {\rm cos} \Delta \lt 0, \\ \end{array} \right.\end{eqnarray} \tag{ 13 }$

provided that $0\lt {{p}_{j}}\lt 0$ (j = 0,1). The phase Δ plays an important role in helping the quantum machine via constructive interference, leading to ${{\mathcal{F}}_{{\rm Q}}}\gt {{\mathcal{F}}_{{\rm c}}}$ . The task fidelities for the other three targets are also listed in table 2. Note here that, for all cases of the target function f_j, ${{\mathcal{F}}_{{\rm Q}}}$ can always be larger than ${{\mathcal{F}}_{{\rm c}}}$ on choosing appropriate free parameters ϕ₁ and ϕ₂ before the learning. Therefore, the quantum machine has wider acceptable regions than the classical machine for a given tolerance. In figure 2, the task fidelity and the acceptable region for each machine are shown for the target f₁ when $\Delta =0$ is chosen to maximize the difference between the two machines. We also found that the acceptable region of the quantum machine is about 5.6 times the size of that of the classical machine.

**Figure 2.** Left column: the task fidelities for classical and quantum machines. Right column: green lines in the magnified views indicate the acceptable regions for a given tolerable error ${{\epsilon }_{t}}=0.05$ around the exact target point, $({{p}_{0}},{{p}_{1}})=(1,1)$ . Here, we set $\Delta =0$ to maximize the task fidelity of the quantum machine. It is found that the acceptable region of the quantum machine is about 5.6 times the size of that of the classical machine.
Download figure:
Standard image High-resolution image

**Figure 2.** Left column: the task fidelities for classical and quantum machines. Right column: green lines in the magnified views indicate the acceptable regions for a given tolerable error ${{\epsilon }_{t}}=0.05$ around the exact target point, $({{p}_{0}},{{p}_{1}})=(1,1)$ . Here, we set $\Delta =0$ to maximize the task fidelity of the quantum machine. It is found that the acceptable region of the quantum machine is about 5.6 times the size of that of the classical machine.
Download figure:
Standard image High-resolution image

Table 2. The task fidelities of the quantum and classical machines are given in terms of the probabilities (p₀ and p₁) for each target function of the one-bit Boolean function. The phase Δ is defined in the main text, and it plays an important role in quantum machine learning.

Function	${{\mathcal{F}}_{{\rm c}}}({{p}_{0}},{{p}_{1}})$	${{\mathcal{F}}_{{\rm Q}}}({{p}_{0}},{{p}_{1}})$
f₁	$\sqrt[4]{{{p}_{0}}({{p}_{0}}{{p}_{1}}+{{q}_{0}}{{q}_{1}})}$	$\sqrt[4]{\mathcal{F}_{{\rm c}}^{4}+{{p}_{0}}{{p}_{\operatorname{int}}}{\rm cos} \Delta }$
f₂	$\sqrt[4]{{{q}_{0}}({{q}_{0}}{{p}_{1}}+{{p}_{0}}{{q}_{1}})}$	$\sqrt[4]{\mathcal{F}_{{\rm c}}^{4}-{{q}_{0}}{{p}_{\operatorname{int}}}{\rm cos} \Delta }$
f₃	$\sqrt[4]{{{p}_{0}}({{q}_{0}}{{p}_{1}}+{{p}_{0}}{{q}_{1}})}$	$\sqrt[4]{\mathcal{F}_{{\rm c}}^{4}-{{p}_{0}}{{p}_{\operatorname{int}}}{\rm cos} \Delta }$
f₄	$\sqrt[4]{{{q}_{0}}({{p}_{0}}{{p}_{1}}+{{q}_{0}}{{q}_{1}})}$	$\sqrt[4]{\mathcal{F}_{{\rm c}}^{4}+{{q}_{0}}{{p}_{\operatorname{int}}}{\rm cos} \Delta }$

The optimal phase condition for improving the task fidelity, as in equation (13), can be generalized to an arbitrary N-bit Boolean function ( $N\gt 1$ ). We provide one of the conditions as

$\begin{eqnarray}{{\phi }_{k}}=\left\{ \begin{array}{ccccccccccccccc} 0 & \;\;{\rm if}\ {{s}_{k}}=0 \\ \pi & \;\;{\rm if}\ {{s}_{k}}=1 \\ \end{array} \right..\end{eqnarray} \tag{ 14 }$

where s_k is the kth component of a solution point ${{{\rm Q}}_{f}}({{s}_{0}},{{s}_{1}},\ldots ,{{s}_{{{2}^{N}}-1}})$ in the ${{2}^{N}}$ -dimensional search space (see appendix A). This condition yields ${{\mathcal{F}}_{{\rm Q}}}\geqslant {{\mathcal{F}}_{{\rm c}}}$ , so the acceptable region of the quantum machine can be wider than that of the classical machine for an arbitrary N-bit Boolean function.

4. Learning speedup via an expanded acceptable region

This section is devoted to the learning time in machine learning. For a numerical simulation, we employ random search as a feedback; this has often been considered for studying learning performance, rather than for any practical reasons [24]. Random search runs as follows. First, all ${{2}^{N}}$ control parameters p_k are randomly chosen, and then, the task fidelity is measured with the chosen p_k parameters. These two steps are thought of as a single iteration of the procedure. The iterations are repeated until the condition $\mathcal{F}\geqslant 1-{{\epsilon }_{t}}$ is satisfied for a given _t. After a sufficient number of simulations have been performed, we then calculate the mean iteration number defined as ${{n}_{c}}=\sum nP(n)$ , where P(n) is the probability of completing learning at the nth iteration. This mean iteration number, n_c, can be used to quantify the learning time, and the results of numerical simulations for n_c are shown in table 3, where quantum learning is demonstrated to be faster than classical learning. This is a direct result of the wider acceptable region of the quantum machine, as n_c is inversely proportional to the size of the acceptable region in random search; ${{n}_{c}}=1/\gamma$ is given by substituting in $P(n)=\gamma {{(1-\gamma )}^{(n-1)}}$ , where γ is equal to the ratio of the acceptable region to the whole space in random search. We demonstrate this by comparing the results for n_c with the acceptable regions γ found from Monte Carlo simulation, given in table 3, and thereby we note that the acceptable region is the main feature directly influencing the learning time in random search.

Table 3. The learning time n_c is compared with the acceptable regions γ, and it is demonstrated that ${{n}_{c}}={{\gamma }^{-1}}$ . This implies that a larger acceptable region leads to a lesser learning time. Simulation failed for N = 4 and 5 in the classical case due to the finite computational resources and very long run time.

	Classical		Quantum
N	γ⁻¹	n_c	γ⁻¹	n_c
1	$1.0\times {{10}^{2}}$	$1.03\times {{10}^{2}}$	$1.8\times {{10}^{1}}$	$1.74\times {{10}^{1}}$
2	$1.4\times {{10}^{4}}$	$1.39\times {{10}^{4}}$	$2.6\times {{10}^{1}}$	$2.68\times {{10}^{1}}$
3	$4.4\times {{10}^{8}}$	$4.67\times {{10}^{8}}$	$5.5\times {{10}^{1}}$	$5.36\times {{10}^{1}}$
4	$9.8\times {{10}^{18}}$	—	$3.5\times {{10}^{2}}$	$3.48\times {{10}^{2}}$
5	$7.1\times {{10}^{41}}$	—	$2.5\times {{10}^{4}}$	$2.48\times {{10}^{4}}$

Also in figure 3, the data for n_c in table 3 are well fitted to a function ${\rm ln} {{n}_{c}}=\alpha D+\beta$ , implying that the size of the acceptable region is exponentially decreased as the dimension $D={{2}^{N}}$ of the parameter space increases, i.e. ${{n}_{c}}={\rm O}({{{\rm e}}^{\alpha D}})$ [32]. The fitting parameters are given as

$\begin{eqnarray}\left\{ \begin{array}{ccccccccccccccc} \alpha \simeq 3.065\pm 0.072,\;\beta \simeq -3.188\pm 1.196 & \;{\rm in}\ {\rm the}\ {\rm classical}\ {\rm case}, \\ \alpha \simeq 0.238\pm 0.008,\;\beta \simeq 2.267\pm 0.127 & \;{\rm in}\ {\rm the}\ {\rm quantum}\ {\rm case}. \\ \end{array} \right.\end{eqnarray} \tag{ 15 }$

It is remarkable that the exponent α in the quantum case is much smaller than that in the classical case.

**Figure 3.** The learning time, n_c, with the dimension $D={{2}^{N}}$ of the parameter space for 1000 realizations. In this work, we consider a constant target function that yields 0 for all inputs ${\boldsymbol{x}}$ , the optimal phase condition of equation (14) is chosen for the quantum machine, and the tolerable error _t is set as 0.05. The data are well fitted to ${\rm ln} {{n}_{c}}=\alpha D+\beta$ in the classical (red line) and quantum (blue line) cases, with the fitting parameters α and β as in equation (15).
Download figure:
Standard image High-resolution image

**Figure 3.** The learning time, n_c, with the dimension $D={{2}^{N}}$ of the parameter space for 1000 realizations. In this work, we consider a constant target function that yields 0 for all inputs ${\boldsymbol{x}}$ , the optimal phase condition of equation (14) is chosen for the quantum machine, and the tolerable error _t is set as 0.05. The data are well fitted to ${\rm ln} {{n}_{c}}=\alpha D+\beta$ in the classical (red line) and quantum (blue line) cases, with the fitting parameters α and β as in equation (15).
Download figure:
Standard image High-resolution image

It follows from what has been shown that the acceptable region is the main feature directly influencing the learning time in random search. We have proved that we can always prepare a quantum machine which has an acceptable region larger than that of the classical one, in the previous section. Therefore, we finally conclude that the learning time can be shorter in the quantum case than in the classical case. The results of numerical simulation also support the assertion that the quantum machine learns much faster, particularly in a large search space. We clarify again that such a quantum speedup is enabled by the quantum superposition, and appropriately arranged phases.

5. Applying differential evolution

We consider a more practical learning model, taking into account real circumstances. A general analysis of the learning efficiency is very complicated, as so many factors are associated with the learning behavior. Furthermore, the most efficient learning algorithms tend to use heuristic rules and are problem-specific [33, 34]. Nevertheless, it is usually believed that the acceptable region is a key factor for the efficiency of learning in a heuristic manner [32]. We conjecture, in this sense, that the quantum machine offers the quantum speedup even in a practical learning method.

We apply differential evolution (DE), which is known as one of the most efficient learning methods for global optimization [30]. We start with M sets of control parameter vectors ${{{\boldsymbol{p}} }_{i}}={{({{p}_{0}},{{p}_{1}},...,{{p}_{{{2}^{N}}-1}})}_{i}}$ , for $i=1,2,...,M$ , whose components are the control parameters of the machine. In DE, these vectors, ${{{\boldsymbol{p}} }_{i}}$ , are supposed to evolve by 'mating' their components p_k with each other. Equation (6) is used as a criterion for how well machines with ${{{\boldsymbol{p}} }_{i}}$ fit to the target. This process is iterated until the task fidelity reaches a certain level of accuracy $1-{{\epsilon }_{t}}$ (see [30] or [20] for the detailed method of effecting differential evolution).

We perform the numerical simulations by increasing N from 1 to 7. The results are averaged over 1000 realizations for M = 50 and ${{\epsilon }_{t}}=0.05$ . The target function is a constant function: $f({\boldsymbol{x}} )=0$ for all ${\boldsymbol{x}}$ . Free parameters in differential evolution (e.g., the crossover rate and differential weight) are chosen to achieve the best learning efficiency for a classical machine⁶ . Nevertheless, we expect the quantum machine to still exhibit the quantum speedup, assisted by the quantum superposition, with the optimal phases in equation (14). We give the mean task fidelity averaged over M in figure 4(a). For both classical and quantum cases, the mean task fidelities are increased close to 1, but the quantum machine is much faster for all cases. We investigate the learning time n_c as we increase the dimension $D={{2}^{N}}$ of the parameter space, as depicted in figure 4(b). The data are well fitted to a presumable function ${{n}_{{\rm c}}}\simeq \alpha {{D}^{\beta }}$ , with $\alpha \simeq 3.82$ , $\beta \simeq 0.97$ for the classical machine, and $\alpha \simeq 1.61$ , $\beta \simeq 0.80$ for the quantum machine⁷ . We note that the quantum machine still exhibits the speedup with the smaller α and β. Therefore, we expect such quantum speedup to be achievable even in real circumstances.

**Figure 4.** (a) The mean task fidelity is given with respect to the iteration n. The simulations are done increasing N from 1 to 7. It is readily observed that the increments of the task fidelities are faster in the quantum situation for all cases. (b) The learning time, ${{n}_{{\rm c}}}$ , as the dimension D of the parameter space increases is shown. The data are well fitted to a presumable function ${{n}_{{\rm c}}}\simeq \alpha {{D}^{\beta }}$ , with $\alpha \simeq 3.82$ , $\beta \simeq 0.97$ in the classical case (red line), and $\alpha \simeq 1.61$ , $\beta \simeq 0.80$ in the quantum case (blue line). Note that the quantum machine still shows better convergence with the smaller α and β.
Download figure:
Standard image High-resolution image

**Figure 4.** (a) The mean task fidelity is given with respect to the iteration n. The simulations are done increasing N from 1 to 7. It is readily observed that the increments of the task fidelities are faster in the quantum situation for all cases. (b) The learning time, ${{n}_{{\rm c}}}$ , as the dimension D of the parameter space increases is shown. The data are well fitted to a presumable function ${{n}_{{\rm c}}}\simeq \alpha {{D}^{\beta }}$ , with $\alpha \simeq 3.82$ , $\beta \simeq 0.97$ in the classical case (red line), and $\alpha \simeq 1.61$ , $\beta \simeq 0.80$ in the quantum case (blue line). Note that the quantum machine still shows better convergence with the smaller α and β.
Download figure:
Standard image High-resolution image

6. Summary and discussion

We investigated the learning performances of two machines by considering the task of finding an N-bit Boolean function which can be used in a binary classification problem. The two machines were designed equivalently to make the comparison of these two machines as convincing as possible. The critical difference between the two machines was that the operations in the quantum machine are described by unitary operators, to deal with the quantum superposition. The learning processes of the two machines were characterized in terms of the acceptable region: the localized region of the parameter space including approximate solutions. We have found that the quantum machine has a wider acceptable region, induced by quantum superposition. We demonstrated a simulation with a standard feedback method, namely random search, to show that the sizes of the acceptable regions were inversely proportional to the learning time. Here, it was also shown that a wider acceptable region makes the learning faster; that is, the learning time is proportional to ${\rm O}({{{\rm e}}^{\alpha D}})$ , with $\alpha \simeq 3.065$ in the classical learning case and $\alpha \simeq 0.238$ for the quantum machine. We then applied a practical learning method, namely differential evolution, to our main task, and observed the learning speedup of the quantum machine.

Here, we wish to recall that the maximized learning speedup of the quantum machine is achieved by choosing suitable phases as in equation (14). From a practical perspective, one may consider that an additional task, such as finding the relative phases, is required to ensure the remarkable performance of the quantum learning machine for other N-bit Boolean function targets. Alternatively, such an issue could be resolved by synchronizing the relative phases with the control parameters in the quantum machine, still yielding the learning speedup (see appendix B for details).

We expect our work to motivate researchers to study the role of various quantum effects in machine learning, and to open up new possibilities for improving the machine learning performance. It is still open whether the quantum machine can be improved more by using other quantum effects, such as quantum entanglement.

Acknowledgments

We acknowledge the financial support of National Research Foundation of Korea (NRF) grants funded by the Ministry of Science, ICT & Future Planning (No. 2010–0018295 and No. 2010–0015059). We also thank T Ralph, M Żukowski and H J Briegel for discussions and advice.

Appendix A.: Finding the optimal phase condition in equation (14)

Let us recall the general form of the task fidelity, as in equation (6). We suppose the target to be a deterministic function. Then, equation (6) is rewritten as

$\begin{eqnarray}&&\mathcal{F}({{p}_{0}},{{p}_{1}},\ldots ,{{p}_{{{2}^{N}}-1}})={{\left( \mathop{\prod }\limits_{{\boldsymbol{x}} }P(\;f({\boldsymbol{x}} )|{\boldsymbol{x}} ) \right)}^{\frac{1}{{{2}^{N+1}}}}}.\end{eqnarray} \tag{ A.1 }$

In deriving the above reduced form of equation (A.1), we used that ${{P}_{\tau }}(y|{\boldsymbol{x}} )=1$ when y is equal to the desired value $f({\boldsymbol{x}} )$ for a given target f, and otherwise ${{P}_{\tau }}(y|{\boldsymbol{x}} )=0$ . equation (A.1) shows that the task fidelity is enlarged if $P(f({\boldsymbol{x}} )|{\boldsymbol{x}} )$ is maximized for all ${\boldsymbol{x}} \ne 0$ .

To start, consider an ideal learning machine (either classical or quantum) that always generates the desired outcome results with perfect task fidelity, $\mathcal{F}=1$ . From our analysis in section 3, we can represent this machine as a point ${\rm S}=({{s}_{0}},{{s}_{1}},...,{{s}_{{{2}^{N}}-1}})$ in the ${{2}^{N}}$ -dimensional search space. In this sense, we consider this ideal machine the 'solution machine'. We then consider a 'near-solution machine' which is located at a point ${\rm Q}=({{p}_{0}},{{p}_{1}},...,{{p}_{{{2}^{N}}-1}})$ in the search space. More specifically, $d({\rm Q},{\rm S})=\sqrt{\sum _{k=0}^{{{2}^{N}}-1}{{\left( {{s}_{k}}-{{p}_{k}} \right)}^{2}}}=\delta$ , where $d({\rm Q},{\rm S})$ is the Euclidean distance. Here we assume further that the search space is isotropic around ${\rm S}$ , so the machines on the surface of the hypersphere $d({\rm Q},{\rm S})=\delta$ have the same task fidelity. This assumption is physically reasonable for very small tolerance error. Thus, without loss of generality, we consider the near-solution machine corresponding to the point ${\rm Q}$ on the sphere $d({\rm Q},{\rm S})=\delta$ , satisfying $|{{s}_{k}}-{{p}_{k}}|=c$ for all k. Here, $c=\sqrt{\delta /{{2}^{N}}}$ .

In these circumstances, $P(f({\boldsymbol{x}} )|{\boldsymbol{x}} )$ for a classical near-solution machine is necessarily smaller than 1, depending on δ. On the other hand, if we choose the optimal phases ϕ_k, then $P(f({\boldsymbol{x}} )|{\boldsymbol{x}} )$ can be 1 without any dependence on δ in the quantum machine. To show this, let us first write the conditional probability $P(f({\boldsymbol{x}} )|{\boldsymbol{x}} )$ in equation (A.1) as

$\begin{eqnarray}&&P(f({\boldsymbol{x}} )|{\boldsymbol{x}} )=|\langle f({\boldsymbol{x}} )|\left( \mathop{\prod }\limits_{k\in {{A}_{{\boldsymbol{x}} }}}{{{\hat{G}}}_{k}} \right)|0\rangle {{|}^{2}},\end{eqnarray} \tag{ A.2 }$

where ${{A}_{{\boldsymbol{x}} }}$ is the index set whose elements are indices of the actually applied operators conditioned on the input ${\boldsymbol{x}} =\{{{x}_{1}},{{x}_{2}},...,{{x}_{N}}\}$ . For example, if ${\boldsymbol{x}} =1$ (i.e. $\{1,0,0...,0\}$ in the binary representation), then we have ${{A}_{{\boldsymbol{x}} }}=\{0,1\}$ because G₀ is always applied independently of the input, and the input signal ${{x}_{1}}=1$ activates G₁ (see figure 1). Thus, ${{\prod }_{k\in {{A}_{1}}}}{{\hat{G}}_{k}}={{\hat{G}}_{1}}{{\hat{G}}_{0}}$ . On the basis of the above description, we can generalize the calculations as

$\begin{eqnarray}\left\{ \begin{array}{ccccccccccccccc} \mathop{\prod }\limits_{k\in {{A}_{1}}}{{{\hat{G}}}_{k}}={{{\hat{G}}}_{1}}{{{\hat{G}}}_{0}} & {\rm for}\;{\boldsymbol{x}} =1, \\ \mathop{\prod }\limits_{k\in {{A}_{2}}}{{{\hat{G}}}_{k}}={{{\hat{G}}}_{2}}{{{\hat{G}}}_{0}} & {\rm for}\;{\boldsymbol{x}} =2, \\ \mathop{\prod }\limits_{k\in {{A}_{3}}}{{{\hat{G}}}_{k}}={{{\hat{G}}}_{3}}{{{\hat{G}}}_{2}}{{{\hat{G}}}_{1}}{{{\hat{G}}}_{0}} & {\rm for}\;{\boldsymbol{x}} =3, \\ \qquad \vdots & {} \\ \end{array} \right.\end{eqnarray} \tag{ A.3 }$

Here, equation (A.2) becomes 1 when c = 0 or equivalently $d({\rm Q},{\rm S})=0$ , because it is nothing but the solution machine. The basic idea is to find a condition for which all terms of c vanish even though c is nonzero, i.e. the near-solution machine. Therefore, $P(f({\boldsymbol{x}} )|{\boldsymbol{x}} )$ for the near-solution machine is mathematically equal to that of the solution machine. To carry out the task, we consider the product of two arbitrary unitaries ${{\hat{G}}_{k}}{{\hat{G}}_{l}}$ ( $k\ne l$ ), as

$\begin{eqnarray}\left( \begin{array}{ccccccccccccccc} \sqrt{{{p}_{k}}} & {{{\rm e}}^{{\rm i}{{\phi }_{k}}}}\sqrt{{{q}_{k}}} \\ {{{\rm e}}^{-{\rm i}{{\phi }_{k}}}}\sqrt{{{q}_{k}}} & -\sqrt{{{p}_{k}}} \\ \end{array} \right)\left( \begin{array}{ccccccccccccccc} \sqrt{{{p}_{l}}} & {{{\rm e}}^{{\rm i}{{\phi }_{l}}}}\sqrt{{{q}_{l}}} \\ {{{\rm e}}^{-{\rm i}{{\phi }_{l}}}}\sqrt{{{q}_{l}}} & -\sqrt{{{p}_{l}}} \\ \end{array} \right).\end{eqnarray} \tag{ A.4 }$

If we consider the near-solution machine, we can let ${{p}_{k(l)}}=|{{s}_{k(l)}}-c|$ and ${{q}_{k(l)}}=1-{{p}_{k(l)}}$ . We then calculate ${{\hat{G}}_{k}}{{\hat{G}}_{l}}$ , for the given ${{s}_{k}},{{s}_{l}}$ in ${\rm S}$ , as

$\begin{eqnarray*}\begin{array}{rcl} {{{\hat{G}}}_{k}}{{{\hat{G}}}_{l}} & = & \left( \begin{array}{ccccccccccccccc} {{{\rm e}}^{{\rm i}\Delta }}(1+{{{\rm e}}^{-{\rm i}\Delta }}c{{\Lambda }_{-}}) & g(c){{{\rm e}}^{{\rm i}{{\phi }_{l}}}}{{\Lambda }_{-}} \\ g(c){{{\rm e}}^{-{\rm i}{{\phi }_{k}}}}{{\Lambda }_{-}} & {{{\rm e}}^{-{\rm i}\Delta }}(1-c{{\Lambda }_{-}}) \\ \end{array} \right)\quad \;{\rm if}\ {{s}_{k}}=0,{{s}_{l}}=0, \\ \end{array}\end{eqnarray*}$

$\begin{eqnarray}\begin{array}{rcl} {{{\hat{G}}}_{k}}{{{\hat{G}}}_{l}} & = & \left( \begin{array}{ccccccccccccccc} g(c){{\Lambda }_{+}} & {{{\rm e}}^{{\rm i}{{\phi }_{l}}}}(1-c{{\Lambda }_{+}}) \\ -{{{\rm e}}^{-{\rm i}{{\phi }_{l}}}}(1+c{{{\rm e}}^{-{\rm i}\Delta }}{{\Lambda }_{+}}) & g(c){{{\rm e}}^{-{\rm i}\Delta }}{{\Lambda }_{+}} \\ \end{array} \right)\;{\rm if}\ {{s}_{k}}=1,{{s}_{l}}=0, \\ {{{\hat{G}}}_{k}}{{{\hat{G}}}_{l}} & = & \left( \begin{array}{ccccccccccccccc} g(c){{\Lambda }_{+}} & -{{{\rm e}}^{{\rm i}{{\phi }_{k}}}}(1-c{{{\rm e}}^{-{\rm i}\Delta }}{{\Lambda }_{+}}) \\ {{{\rm e}}^{-{\rm i}{{\phi }_{k}}}}(1-c{{\Lambda }_{+}}) & g(c){{{\rm e}}^{-{\rm i}\Delta }}{{\Lambda }_{+}} \\ \end{array} \right)\;{\rm if}\ {{s}_{k}}=0,{{s}_{l}}=1, \\ {{{\hat{G}}}_{k}}{{{\hat{G}}}_{l}} & = & \left( \begin{array}{ccccccccccccccc} 1-c{{\Lambda }_{-}} & -g(c){{{\rm e}}^{{\rm i}{{\phi }_{k}}}}{{\Lambda }_{-}} \\ g(c){{{\rm e}}^{-{\rm i}{{\phi }_{k}}}}{{\Lambda }_{-}} & 1-c{{{\rm e}}^{{\rm i}\Delta }}{{\Lambda }_{-}} \\ \end{array} \right)\quad \;{\rm if}\ {{s}_{k}}=1,{{s}_{l}}=1, \\ \end{array}\end{eqnarray} \tag{ A.5 }$

where ${{\Lambda }_{\pm }}=1\pm {{{\rm e}}^{{\rm i}\Delta }}$ , $\Delta ={{\phi }_{k}}-{{\phi }_{l}}$ , and $g(c)=\sqrt{c-{{c}^{2}}}$ . In calculating equation (A.5), we assumed a deterministic target, i.e. ${{s}_{k(l)}}$ is to be either 0 or 1, as is usual in most tasks (but not necessarily the case). Here, the important thing is that we can cause the term associated with c to vanish, by letting

$\begin{eqnarray}{{\Lambda }_{\pm }}=0,\;{\rm or}\ {\rm equivalently},\;\left\{ \begin{array}{ccccccccccccccc} {{\phi }_{k}}={{\phi }_{l}} & \;\;{\rm if}\ {{s}_{l}}={{s}_{k}}, \\ {{\phi }_{k}}={{\phi }_{l}}+\pi & \;\;{\rm if}\ {{s}_{l}}\ne {{s}_{k}}. \\ \end{array} \right.\end{eqnarray} \tag{ A.6 }$

The above condition in equation (A.6) can be applied for all $k\ne l$ . Thus, we provide here a generalized condition:

$\begin{eqnarray}{{\phi }_{k}}=\left\{ \begin{array}{ccccccccccccccc} 0 & \;\;{\rm if}\ {{s}_{k}}=0, \\ \pi & \;\;{\rm if}\ {{s}_{k}}=1. \\ \end{array} \right.\end{eqnarray} \tag{ A.7 }$

This is the optimal phase condition, as in (14). We can check that this condition gives the maximum task fidelity with $P(f({\boldsymbol{x}} )|{\boldsymbol{x}} )=1$ (for all ${\boldsymbol{x}} \ne 0$ ).

Appendix B.: A practical version of a quantum machine

The speedup introduced in this paper is enabled when a quantum machine uses suitable phases. Accordingly, the suitable phases are prerequisites for fast learning. In a practical case, the learning time has to include complexity to obtain suitable phases, and this is not very easy to achieve. We introduce a practical quantum machine that does not require the effort of finding optimal phases. To this end, we modify the unitary ${{\hat{G}}_{k}}$ in equation (5) by setting all the phases ϕ_k as $\pi {{p}_{k}}$ , i.e., ${{\hat{G}}_{k}}$ is written as

$\begin{eqnarray}{{\hat{G}}_{k}}=\left( \begin{array}{ccccccccccccccc} \sqrt{{{p}_{k}}} & {{{\rm e}}^{{\rm i}\pi {{{\rm p}}_{k}}}}\sqrt{1-{{p}_{k}}} \\ {{{\rm e}}^{-{\rm i}\pi {{{\rm p}}_{k}}}}\sqrt{1-{{p}_{k}}} & -\sqrt{{{p}_{k}}} \\ \end{array} \right),\end{eqnarray} \tag{ B.1 }$

such that the phases $\pi {{p}_{k}}$ are getting closer to the optimized phases $\pi {{s}_{k}}$ as the machine approaches the solution point in the parameter space during the learning, since the optimized phase condition is given by equation (14). Thus, this guarantees wider acceptable regions than for the classical machine for any learning target.

Figure B1 (a) shows that the practical quantum machine has wider acceptable regions than the classical machine for all one-bit Boolean targets. The areas inside the solid and dashed lines represent the acceptable regions for the practical quantum machine and the classical machine, respectively. This supports the assertion that the practical quantum machine always learns faster than the classical machine, while the performance of the original quantum machine depends on the target function.

We then obtained the learning time of the practical quantum machine, which is shown in figure B1(b). These data are also well fitted to ${\rm ln} {{n}_{c}}=\alpha D+\beta$ , with the fitting parameters $\alpha \simeq 0.985\pm 0.101$ and $\beta \simeq -0.200\pm 1.662$ . Thus, ${{n}_{c}}\sim {\rm O}({{{\rm e}}^{0.985D}})$ for the practical quantum machine, whereas ${{n}_{c}}\simeq {\rm O}({{{\rm e}}^{3.065D}})$ for the classical machine (see equation (15)). This result shows that a considerable learning speedup is still achieved with this practical quantum machine, even though it takes up a little more time as compared to an original machine available with the optimal relative phases ( ${{n}_{c}}\sim {\rm O}({{{\rm e}}^{0.238D}})$ ).