Main

The complex nature of quantum many-body systems motivates the use of machine learning techniques to analyse them. Indeed, large-scale neural networks have successfully solved classically difficult problems such as image recognition or optimization of classical error correction1, and their architectures have been related to various physical concepts2,3. As such, a number of recent works have used neural networks to study properties of quantum many-body systems4,5,6,7,8,9,10. However, the direct application of these classical algorithms is challenging for intrinsically quantum problems, which take quantum states or processes as inputs. This is because the extremely large many-body Hilbert space hinders the efficient translation of such problems into a classical framework without performing exponentially difficult quantum state or process tomography11,12.

Recent experimental progress towards realizing quantum information processors13,14,15,16 has led to proposals for the use of quantum computers to enhance conventional machine learning tasks17,18,19,20. Motivated by such developments, we introduce and analyse a machine learning-inspired quantum circuit model—the quantum convolutional neural network (QCNN)—and demonstrate its ability to solve important classes of intrinsically quantum many-body problems. The first class of problems we consider is quantum phase recognition (QPR), which asks whether a given input quantum state ρin belongs to a particular quantum phase of matter. In contrast to many existing schemes based on tensor network descriptions21,22,23, we assume that ρin is prepared in a physical system without direct access to its classical description. The second class, quantum error correction (QEC) optimization, asks for an optimal QEC code for a given, a priori unknown error model such as dephasing or potentially correlated depolarization in realistic experimental settings. We provide both theoretical insight and numerical demonstrations for the successful application of a QCNN to these important problems, and show its feasibility for near-term experimental implementation.

QCNN circuit model

Convolutional neural networks provide a successful machine learning architecture for classification tasks such as image recognition1,24,25. A CNN generally consists of a sequence of different (interleaved) layers of image processing; in each layer, an intermediate two-dimensional array of pixels, called a feature map, is produced from the previous one (Fig. 1a). (More generally, CNN layers connect ‘volumes’ of multiple feature maps to subsequent volumes; for simplicity, we consider only a single feature map per volume and leave the generalization to future works.) The convolution layers compute new pixel values \(x_{i,j}^{(\ell )}\) from a linear combination of nearby ones in the preceding map \(x_{i,j}^{(\ell )} = \mathop {\sum}\limits_{a,b = 1}^w {w_{a,b}} x_{i + a,j + b}^{(\ell - 1)}\), where the weights wa,b form a w × w matrix. Pooling layers reduce feature map size, for example by taking the maximum value from a few contiguous pixels, and are often followed by the application of a nonlinear (activation) function. Once the feature map size becomes sufficiently small, the final output is computed from a function that depends on all remaining pixels (the fully connected layer). The weights and fully connected function are optimized by training on large datasets. In contrast, variables such as the number of convolution and pooling layers and the size w of the weight matrices (known as hyperparameters) are fixed for a specific CNN1. The key properties of a CNN are thus translationally invariant convolution and pooling layers, each characterized by a constant number of parameters (independent of system size) and sequential data size reduction (that is, a hierarchical structure).

Fig. 1: The concept of QCNNs.
figure 1

a, Simplified illustration of classical CNNs. A sequence of image-processing layers transforms an input image into a series of feature maps (blue rectangles) and finally into an output probability distribution (purple bars). C, convolution; P, pooling; FC, fully connected. b, QCNNs inherit a similar layered structure. Boxes represent unitary gates or measurement with feed-forwarding. c, The QCNN and the MERA share the same circuit structure, but run in reverse directions. Image of cat from https://www.pexels.com/photo/grey-and-white-short-fur-cat-104827/.

Motivated by this architecture, we introduce a QCNN circuit model extending these key properties to the quantum domain (Fig. 1b). The circuit’s input is an unknown quantum state ρin. A convolution layer applies a single quasilocal unitary (Ui) in a translationally invariant manner for finite depth. For pooling, a fraction of qubits are measured, and their outcomes determine unitary rotations (Vj) applied to nearby qubits. Hence, nonlinearities in QCNNs arise from reducing the number of degrees of freedom. Convolution and pooling layers are performed until the system size is sufficiently small; then, a fully connected layer is applied as a unitary F on the remaining qubits. Finally, the outcome of the circuit is obtained by measuring a fixed number of output qubits. As in the classical case, QCNN hyperparameters such as the number of convolution and pooling layers are fixed, and the unitaries themselves are learned.

A QCNN to classify N-qubit input states is thus characterized by O(log(N)) parameters. This corresponds to a double exponential reduction compared with a generic quantum circuit-based classifier19 and allows for efficient learning and implementation. For example, given a set of M classified training vectors {(|ψα〉, yα): α = 1, …, M}, where |ψα〉 are input states and yα = 0 or 1 are corresponding binary classification outputs, one could compute the mean squared error

$${\mathrm{MSE}} = \frac{1}{{2M}}\mathop {\sum}\limits_{\alpha = 1}^M {\left( {y_i - f_{\{ U_i,V_j,F\} }\left( {\left| {\psi _\alpha } \right\rangle } \right)} \right)^2}$$
(1)

Here, \(f_{\{ U_i,V_j,F\} }\left( {\left| {\psi _\alpha } \right\rangle } \right)\) denotes the expected QCNN output value for input |ψα〉. Learning then consists of initializing all unitaries and successively optimizing them until convergence, for example via gradient descent.

To gain physical insight into the mechanism underlying QCNNs and motivate their application to the problems under consideration, we now relate our circuit model to two well-known concepts in quantum information theory—the multiscale entanglement renormalization ansatz26 (MERA) and QEC. The MERA framework provides an efficient tensor network representation of many classes of interesting many-body wavefunctions, including those associated with critical systems26,27,28. A MERA can be understood as a quantum state generated by a sequence of unitary and isometry layers applied to an input state (for example |00〉). While both types of layers apply quasilocal unitary gates, each isometry layer first introduces a set of new qubits in a predetermined state, such as |0〉 (Fig. 1c). This exponentially growing, hierarchical structure allows for the long-range correlations associated with critical systems. The QCNN circuit has similar structure but runs in the reverse direction. Hence, for any given state |ψ〉 with a MERA representation, there is always a QCNN that recognizes |ψ〉 with deterministic measurement outcomes; one such QCNN is simply the inverse of the MERA circuit.

For input states other than |ψ〉, however, such a QCNN does not generally produce deterministic measurement outcomes. These additional degrees of freedom distinguish a QCNN from a MERA. Specifically, we can identify the measurements as syndrome measurements in QEC29, which determine error correction unitaries Vj to apply to the remaining qubit(s). Thus, a QCNN circuit with multiple pooling layers can be viewed as a combination of a MERA (an important variational ansatz for many-body wavefunctions) and nested QEC (a mechanism to detect and correct local quantum errors without collapsing the wavefunction). This makes QCNNs a powerful architecture for classifying input quantum states or devising new QEC codes. In particular, for QPR, the QCNN can provide a MERA realization of a representative state |ψ0〉 in the target phase. Other input states within the same phase can be viewed as |ψ0〉 with local errors, which are repeatedly corrected by the QCNN in multiple layers. In this sense, the QCNN circuit can mimic renormalization-group flow, a methodology that successfully classifies many families of quantum phases30. For QEC optimization, the QCNN structure allows for simultaneous optimization of efficient encoding and decoding schemes with potentially rich entanglement structure.

Detecting a 1D symmetry-protected topological phase

We first demonstrate the potential of a QCNN by applying it to QPR in a class of 1D many-body systems. Specifically, we consider a \({\Bbb Z}_2 \times {\Bbb Z}_2\) symmetry-protected topological (SPT) phase \({\mathcal{P}}\), a phase containing the S = 1 Haldane chain31, and ground states {|ψG〉} of a family of Hamiltonians on a spin-1/2 chain with open boundary conditions:

$$H = - J\mathop {\sum}\limits_{i = 1}^{N - 2} Z_iX_{i + 1}Z_{i + 2} - h_1\mathop {\sum}\limits_{i = 1}^N X_i - h_2\mathop {\sum}\limits_{i = 1}^{N - 1} X_iX_{i + 1}$$
(2)

where Xi, Zi are Pauli operators for the spin at site i, and h1, h2 and J are parameters of the Hamiltonian. The \({\Bbb Z}_2 \times {\Bbb Z}_2\) symmetry is generated by \(X_{{\mathrm{even}}({\mathrm{odd}})}= \mathop {\prod}\limits_{i \in {\mathrm{even}}({\mathrm{odd}})} {X_i}\). Figure 2a shows the phase diagram as a function of (h1/J, h2/J). When h2 = 0, the Hamiltonian is exactly solvable via the Jordan–Wigner transformation30, confirming that \({\cal{P}}\) is characterized by non-local order parameters. When h1 = h2 = 0, all terms are mutually commuting, and a ground state is the 1D cluster state. Our goal is to identify whether a given, unknown ground state drawn from the phase diagram belongs to \({\cal{P}}\).

Fig. 2: Application to quantum phase recognition.
figure 2

a, The phase diagram of the Hamiltonian in the main text. The phase boundary points (blue and red diamonds) are extracted from infinite-size DMRG numerical simulations, while the background shading (colour scale) represents the output from the exact QCNN circuit for input size N = 45 spins (see Methods). b, Exact QCNN circuit to recognize a \({\Bbb Z}_2 \times {\Bbb Z}_2\) SPT phase. Blue line segments represent controlled-phase gates, blue three-qubit gates are Toffoli gates with the control qubits in the X basis, and orange two-qubit gates flip the target qubit’s phase when the X measurement yields −1. The fully connected layer applies controlled-phase gates followed by an Xi projection, effectively measuring Zi−1XiZi+1. c, Exact QCNN output along h1 = 0.5J for N = 135 spins, depths d = 1, …, 4 (from light to dark blue). d, Sample complexity of QCNN at depths d = 1, …, 4 (from light to dark blue) versus SOPs of length N/2, N/3, N/5 and N/6 (from light to dark red) to detect the SPT/paramagnet phase transition along h1 = 0.5J for N = 135 spins. The critical point is identified as h2/J = 0.423 using infinite-size DMRG. In the shaded area, the correlation length exceeds the system size, and finite-size effects can considerably affect our results. Inset: the ratio of SOP sample complexity to QCNN sample complexity is plotted as a function of d on a logarithmic scale for h2/J = 0.3918. In the numerically accessible regime, this reduction of sample complexity scales exponentially as 1.73e0.28d (trendline).

As an example, we first present an exact, analytical QCNN circuit that recognizes \({\cal{P}}\) (Fig. 2b). The convolution layers involve controlled-phase gates as well as Toffoli gates with controls in the X-basis, and pooling layers perform phase-flips on remaining qubits when one adjacent measurement yields X = −1. This convolution–pooling unit is repeated d times, where d is the QCNN depth. The fully connected layer measures Zi−1XiZi+1 on the remaining qubits. Figure 2c shows the QCNN output for a system of N = 135 spins and d = 1, …, 4 along h1 = 0.5J, obtained using matrix product state simulations. As d increases, the measurement outcomes show sharper changes around the critical point, and the output of a d = 2 circuit already reproduces the phase diagram with high accuracy (Fig. 2a). This QCNN can also be used for other Hamiltonian models belonging to the same phase, such as the S = 1 Haldane chain31 (see Methods).

Sample complexity

The performance of a QPR solver can be quantified by sample complexity11: what is the expected number of copies of the input state required to identify its quantum phase? We demonstrate that the sample complexity of our exact QCNN circuit is substantially better than that of conventional methods. In principle, \({\cal{P}}\) can be detected by measuring a non-zero expectation value of string order parameters (SOPs)32,33 \(S\) such as

$$S_{ab} = Z_aX_{a + 1}X_{a + 3}\,...\,X_{b - 3}X_{b - 1}Z_b$$
(3)

In practice, however, the expectation values of SOP vanish near the phase boundary due to diverging correlation length33; since quantum projection noise is maximal in this vicinity, many experimental repetitions are required to affirm a non-zero expectation value. In contrast, the QCNN output is much sharper near the phase transition, so fewer repetitions are required.

Quantitatively, given some |ψin〉 and SOP \(S\), a projective measurement of S can be modelled as a (generalized) Bernoulli random variable, where the outcome is 1 with probability p = (〈ψin|S|ψin〉 + 1) / 2 and −1 with probability 1 − p (since \({S}^2\) equals the identity operator); after M binary measurements, we estimate p. p > p0 = 0.5 signifies \(\left| {\psi _{{\mathrm{in}}}} \right\rangle \in {\cal{P}}\). We define the sample complexity Mmin as the minimum M to test whether p > p0 with 95% confidence using an arcsine variance-stabilizing transformation34:

$$M_{{\mathrm{min}}} = \frac{{1.96^2}}{{({\mathrm{arcsin}} \sqrt p - \sqrt {{\mathrm{arcsin}} p_0} )^2}}$$
(4)

Similarly, the sample complexity for a QCNN can be determined by replacing 〈ψin|S|ψin〉 by the QCNN output expectation value in the expression for p.

Figure 2d shows the sample complexity for the QCNN at various depths and SOPs of different lengths. The QCNN clearly requires substantially fewer input copies throughout the parameter regime, especially near criticality. More importantly, although the SOP sample complexity scales independently of string length, the QCNN sample complexity consistently improves with increasing depth and is limited only by finite size effects in our simulations. In particular, compared with SOPs, the QCNN reduces sample complexity by a factor that scales exponentially with the depth of the QCNN in numerically accessible regimes (inset). Such scaling arises from the iterative QEC performed at each depth and is not expected from any measurements of simple (potentially nonlocal) observables. We show in the Methods that our QCNN circuit measures a multiscale string order parameter—a sum of products of exponentially many different SOPs that remains sharp up to the phase boundary.

MERA and QEC

Futher insights into the performance of the QCNN are revealed by interpreting it in terms of MERA and QEC. In particular, our QCNN is specifically designed to contain the MERA representation of the 1D cluster state (|ψ0〉) such that it becomes a stable fixed point. When |ψ0〉 is used as input, each convolution–pooling unit produces the same state |ψ0〉 with reduced system size in the unmeasured qubits, while yielding deterministic outcomes (X = 1) in the measured qubits. The fully connected layer measures the SOP for |ψ0〉. When an input wavefunction is perturbed away from |ψ0〉, our QCNN corrects such ‘errors’. For example, if a single X error occurs, the first pooling layer identifies its location, and controlled unitary operations correct the error propagated through the circuit (Fig. 3). Similarly, if an initial state has multiple, sufficiently separated errors (possibly in coherent superpositions), the error density after several iterations of convolution and pooling layers will be substantially smaller35. If the input state converges to the fixed point, our QCNN classifies it into the SPT phase with high fidelity. Clearly, this mechanism resembles the classification of quantum phases based on renormalization-group flow.

Fig. 3: MERA and QEC in the QCNN circuit.
figure 3

When the input state is the cluster state with a single-qubit X error (left), the convolution–pooling unit of our circuit identifies and corrects the error, while reducing the system size by a factor of 3 (right). This process resembles the combination of MERA and QEC.

Obtaining QCNN from training procedure

Having analytically illustrated the computational power of the QCNN circuit model, we now demonstrate how a QCNN for \({\cal{P}}\) can also be obtained using the learning procedure. Details of the hyperparameters of the QCNN can be found in the Methods and Supplementary Fig. 2. Initially, all unitaries are set to random values. As classically simulating our training procedure requires expensive computational resources, we focus on a relatively small system with N = 15 spins and QCNN depth d = 1; there are a total of 1,309 parameters to be learned (see Methods). Our training data consists of 40 evenly spaced points along the line h2 = 0, where the Hamiltonian is exactly solvable by the Jordan–Wigner transformation. Using gradient descent with the MSE function (1), we iteratively update the unitaries until convergence (see Methods). The classification output of the resulting QCNN for generic h2 is shown in Fig. 4. This QCNN accurately reproduces the two-dimensional phase diagram over the entire parameter regime, despite being trained only on samples from a set of solvable points that do not even cross the lower phase boundary.

Fig. 4: Output of a trained QCNN.
figure 4

We numerically optimize our QCNN for a system of N = 15 spins and d = 1 starting from random initial values. The training data points are 40 equally spaced points h1 [0, 2] along the line h2 = 0 where the Hamiltonian is solvable by the Jordan–Wigner transformation (grey dots). The blue and red diamonds are phase boundary points extracted from infinite-size DMRG numerical simulations, while the background shading (colour scale) represents the expectation value of QCNN output.

This example illustrates how the QCNN structure avoids overfitting to training data with its exponentially reduced number of parameters. While the training dataset for this particular QPR problem consists of solvable points, more generally, such a dataset can be obtained by using traditional methods (such as measuring SOPs) to classify representative states that can be efficiently generated either numerically or experimentally36,37.

Optimizing QEC

As seen in the previous example, the QCNN’s architecture enables one to perform effective QEC. We next leverage this feature to design a new QEC code that is itself optimized for a given error model. More specifically, any QCNN circuit (and its inverse) can be viewed as a decoding (encoding) quantum channel between the physical input qubits and the logical output qubit. The encoding scheme introduces sets of new qubits in a predetermined state, for example |0〉, while the decoding scheme performs measurements (Fig. 5a). Given an error channel \({\cal{N}}\), our aim is therefore to maximize the recovery fidelity

$$f = \mathop {\sum}\limits_{\left| {\psi _l} \right\rangle \in \{ \pm x,y,z\} } {\left\langle {\psi _l} \right|} {\cal{M}}^{ - 1}\left( {{\cal{N}}\left( {{\cal{M}}\left( {\left| {\psi _l} \right\rangle \left\langle {\psi _l} \right|} \right)} \right)} \right)\left| {\psi _l} \right\rangle$$
(5)

where \({\cal{M}}( {{\cal{M}}^{ - 1}})\) is the encoding (decoding) scheme generated by a QCNN circuit, and |±x, y, z〉 are the ±1 eigenstates of the Pauli matrices. Thus, our method simultaneously optimizes both encoding and decoding schemes, while ensuring their efficient implementation in realistic systems. The variational optimization can be carried out with an unknown \({\cal{N}}\), since f can be evaluated experimentally.

Fig. 5: QCNN for optimizing quantum error correction.
figure 5

a, Schematic for using QCNNs to optimize QEC. The inverse QCNN encodes a single logical qubit |ψl〉 into nine physical qubits, which undergo noise \({\cal{N}}\). The QCNN then decodes these to obtain the logical state ρ. Our aim is to maximize 〈ψl|ρ|ψl〉. b, Logical error rate of Shor code (blue) versus a learned QEC code (orange) in a correlated error model. The input error rate is defined as the sum of all probabilities pμ and pxx. The performance of the Shor code is worse than performing no error correction at all (identity, grey line), while the optimized code can still substantially reduce the error rate.

To illustrate the potential of this procedure, we consider a two-layer QCNN with N = 9 physical qubits and 126 variational parameters (Fig. 5a and Methods). This particular architecture includes the nested (classical) repetition codes and the 9-qubit Shor code38; in the following, we compare our performance to the better of the two. We consider three different input error models: (1) independent single-qubit errors on all qubits with equal probabilities pμ for μ = X, Y and Z errors or (2) anisotropic probabilities px ≠ py = pz, and (3) independent single-qubit anisotropic errors with additional two-qubit correlated errors XiXi+1 with probability pxx.

On initializing all QCNN parameters to random values and numerically optimizing them to maximize f, we find that our model produces the same logical error rate as known codes in case (1) but can reduce the error rate by a constant factor of up to 50% in case (2), depending on the specific input error probability ratios (see Methods and Supplementary Fig. 4). In case (3), the optimized QEC code performs substantially better than known codes (Fig. 5b). As the Shor code is only guaranteed to correct arbitrary single-qubit errors, it performs more poorly than using no error correction, while the optimized QEC code performs much better. This example demonstrates the power of using QCNNs to obtain and optimize new QEC codes for realistic, a priori unknown error models.

Experimental realizations

Our QCNN architecture can be efficiently implemented on several state-of-the-art experimental platforms. The key ingredients for realizing QCNNs include the efficient preparation of quantum many-body input states, the application of two-qubit gates at various length scales and projective measurements. As in stabilizer-based QEC, the measurements of intermediate qubits and feed-forwarding can be replaced by controlled two-qubit unitary operations so that measurements are performed only at the end of an experimental sequence. These capabilities have already been demonstrated in multiple programmable quantum simulators consisting of N ≥ 50 qubits based on trapped neutral atoms and ions, or superconducting qubits39,40,41,42.

As an example, we present a feasible protocol for near-term implementation of our exact cluster model QCNN circuit via neutral Rydberg atoms39,43, where long-range dipolar interactions allow high-fidelity entangling gates44 among distant qubits in a variable geometric arrangement. The qubits can be encoded in the hyperfine ground states, where one of the states can be coupled to the Rydberg level to perform efficient entangling operations via the Rydberg-blockade mechanism44; an explicit implementation scheme for every gate in Fig. 2b is provided in the Methods. Our QCNN at depth d with N input qubits requires at most \(\frac{{7N}}{2}(1 - 3^{1 - d}) + N3^{1 - d}\) multi-qubit operations and 4d single-qubit rotations. For a realistic effective coupling strength Ω ≈ 2π × 10–100 MHz and single-qubit coherence time τ ≈ 200 μs limited by the Rydberg state lifetime, approximately Ωτ ≈ 2π × 103–104 multi-qubit operations can be performed, and a d = 4 QCNN on N ≈ 100 qubits is feasible. These estimates are reasonably conservative as we have not considered advanced control techniques such as pulse-shaping45 or potentially parallelizing independent multi-qubit operations.

Outlook

These considerations indicate that QCNNs provide a promising quantum machine learning paradigm. Several interesting generalizations and future directions can be considered. For example, while we have only presented the QCNN circuit structure for recognizing 1D phases, it is straightforward to generalize the model to higher dimensions, where phases with intrinsic topological order such as the toric code are supported46,47,48. Such studies could potentially identify nonlocal order parameters with low sample complexity for lesser-understood phases such as quantum spin liquids49 or anyonic chains50. To recognize more exotic phases, we could also relax the translation–invariance constraints, resulting in O(Nlog(N)) parameters for system size N, or use ancilla qubits to implement parallel feature maps following traditional CNN architecture. Further extensions could incorporate optimizations for fault-tolerant operations on QEC code spaces. Finally, while we have used a finite-difference scheme to compute gradients in our learning demonstrations, the structural similarity of a QCNN to its classical counterpart suggests the possibility of adopting more efficient schemes such as backpropagation1.

Methods

Phase diagram and QCNN circuit simulations

The phase diagram in Fig. 2a was numerically obtained using the infinite-size density matrix renormalization group (DMRG) algorithm. We generally follow the method outlined in ref. 51 with 150 as the maximum bond dimension. To extract each data point in Fig. 2a, we numerically obtain the ground state energy density as a function of h2 for fixed h1 and compute its second-order derivative. The phase boundary points are identified from sharp peaks.

The simulation of our QCNN in Fig. 2b also uses matrix product state representations. We first obtain the input ground state wavefunction using finite-size DMRG51 with bond dimension D = 130 for a system of N = 135 qubits. We then perform the circuit operations by sequentially applying swap gates and two-qubit gates on the nearest neighboring qubits52. Each three-qubit gate is decomposed into two-qubit unitaries53. We find that increasing the bond dimension to D = 150 does not lead to any visible changes in our main figures, confirming a reasonable convergence of our method. The colour plot in Fig. 2a is similarly generated for a system of N = 45 qubits.

QCNN for the S = 1 Haldane chain

As discussed in the main text, the (spin-1/2) 1D cluster state belongs to an SPT phase protected by \({\Bbb Z}_2 \times {\Bbb Z}_2\) symmetry, a phase that also contains the celebrated S = 1 Haldane chain31. It is thus natural to ask whether this circuit can be used to detect the phase transition between the Haldane phase and an S = 1 paramagnetic phase, which we numerically demonstrate here.

The one-parameter family of Hamiltonians we consider for the Haldane phase is defined on a 1D chain of N spin-1 particles with open boundary conditions31:

$$H_{{\mathrm{Haldane}}} = J\mathop {\sum}\limits_{j = 1}^N {{\mathbf{S}}_{\mathbf{j}}} \cdot {\mathbf{S}}_{{\mathbf{j}} + {\mathbf{1}}} + \omega \mathop {\sum}\limits_{j = 1}^N {(S_j^z)^2}$$
(6)

In this equation, Sj denotes the vector of S = 1 spin operators at site j, and J, ω are parameters of the Hamiltonian. The system is protected by a \({\Bbb{Z}}_2 \times {\Bbb{Z}}_2\) symmetry generated by global π-rotations of every spin around the X and Y axes: \(R_{x} = \mathop {\prod}\limits_{j} {e^{i{\pi}S_{j}^{x}}}\), \(R_y = \mathop {\prod}\limits_j {\mathrm{e}}^{i{\pi} S_{j}^{y}}\). When ω is zero or small compared to J, the ground state belongs to the SPT phase, but when ω/J is sufficiently large, the ground state becomes paramagnetic31.

To apply our QCNN circuit to this Haldane phase, we must first identify a quasilocal isometric map U between the two models, because their representations of the symmetry group are distinct. More specifically, since the cluster model has a \({\Bbb Z}_2 \times {\Bbb Z}_2\) symmetry generated by \(X_{{\mathrm{even}}({\mathrm{odd}})} = \mathop {\prod}\limits_{i \in {\mathrm{even}}({\mathrm{odd}})} {X_i}\), we require \(UR_xU^\dagger = X_{{\mathrm{odd}}}\) and \(UR_yU^\dagger = X_{{\mathrm{even}}}\). Such a map can be constructed following ref. 54. Intuitively, it extends the local Hilbert space of a spin-1 particle by introducing a spin singlet state |s〉 and mapping it to a pair of spin-1/2 particles: \(\left| x \right\rangle \mapsto \left| { + - } \right\rangle\), \(\left| y \right\rangle \mapsto - \left| { - + } \right\rangle\), \(\left| z \right\rangle \mapsto - i\left| { - - } \right\rangle\), \(\left| s \right\rangle \mapsto \left| { + + } \right\rangle\). Here, |±〉 denote the |±1〉 eigenstates of the (spin-1/2) Pauli matrix X. |μ〉 denotes a spin-1 state defined by \(R_\nu \left| \mu \right\rangle = ( - 1)^{\delta _{\mu ,\nu } + 1}\left| \mu \right\rangle\) (μ, ν {x, y, z}). The QCNN circuit for the Haldane chain thus consists of applying U followed by the circuit presented in the main text.

Supplementary Fig. 1 shows the QCNN output for an input system of N = 54 spin-1 particles at d = 1, …, 4, obtained using matrix product state simulations with D = 160. For this system size, we numerically identified the critical point as ω/J = 1.035 ± 0.005 by using DMRG to obtain the second derivative of energy density as a function of ω and J. The QCNN provides accurate identification of the phase transition.

Multiscale string order parameters

We examine the final operator measured by our circuit that recognizes the SPT phase in the Heisenberg picture. Although a QCNN performs non-unitary measurements in the pooling layers, similar to QEC circuits29, one can postpone all measurements to the end and replace pooling layers by unitary-controlled gates acting on both measured and unmeasured qubits. In this way, the QCNN is equivalent to measuring a non-local observable

$${\cal{O}} = (U_{{\mathrm{CP}}}^{(d)}...U_{{\mathrm{CP}}}^{(1)})^\dagger Z_{i - 1}X_iZ_{i + 1}(U_{{\mathrm{CP}}}^{(d)}...U_{{\mathrm{CP}}}^{(1)})$$
(7)

where i is the index of the measured qubit in the final layer and \(U_{{\mathrm{CP}}}^{(l)}\) is the unitary corresponding to the convolution–pooling unit at depth l. A more explicit expression of \({\cal{O}}\) can be obtained by commuting UCP with the Pauli operators, which yields recursive relations:

$$U_{{\mathrm{CP}}}^\dagger X_iU_{{\mathrm{CP}}} = X_{\tilde i - 2}X_{\tilde i}X_{\tilde i + 2}$$
(8)
$$U_{{\mathrm{CP}}}^\dagger Z_iU_{{\mathrm{CP}}} = \frac{1}{2}(Z_{\tilde i} + Z_{\tilde i - 2}X_{\tilde i - 1} + X_{\tilde i + 1}Z_{\tilde i + 2} - Z_{\tilde i - 2}X_{\tilde i - 1}Z_{\tilde i}X_{\tilde i + 1}Z_{\tilde i + 2})$$
(9)

\(\tilde{i}\) enumerates every qubit at depth l − 1, including those measured in the pooling layer. It follows that an SOP of the form ZXXXZ at depth l transforms into a weighted linear combination of 16 products of SOPs at depth l − 1. Thus, instead of measuring a single SOP, our QCNN circuit measures a sum of products of exponentially many different SOPs:

$${\cal{O}} = \mathop {\sum}\limits_{ab} {C_{ab}^{(1)}} {S}_{ab} + \mathop {\sum}\limits_{a_1b_1a_2b_2} {C_{a_1b_1a_2b_2}^{(2)}} {S}_{a_1b_1}{S}_{a_2b_2} + \cdots$$
(10)

\({\cal{O}}\) can be viewed as a multiscale SOP with coefficients computed recursively in d using equations (8) and (9). This allows the QCNN to produce a sharp classification output even when the correlation length is as long as 3d.

Demonstration of learning procedure for QPR

To perform our learning procedure in a QPR problem, we choose the hyperparameters for the QCNN as shown in Supplementary Fig. 2. This hyperparameter structure can be used for generic (1D) phases, and is characterized by a single integer n that determines the reduction of system size in each convolution–pooling layer, L → L/n. (Supplementary Fig. 2 shows the special case where n = 3.) The first convolution layer involves (n + 1)-qubit unitaries starting on every nth qubit. This is followed by n layers of n-qubit unitaries arranged as shown in Supplementary Fig. 2. The pooling layer measures n − 1 out of every contiguous block of n qubits; each of these is associated with a unitary Vj applied to the remaining qubit, depending on the measurement outcome. This set of convolution and pooling layers is repeated d times. Finally, the fully connected layer consists of an arbitrary unitary on the remaining N/nd qubits, and the classification output is given by the measurement output of the middle qubit (or any fixed choice of one of them). For our example, we choose n = 3 because the Hamiltonian in equation (2) involves three-qubit terms.

In our simulations, we consider only N = 15 spins and d = 1, because simulating quantum circuits on classical computers requires a large amount of resources. We parameterize unitaries as exponentials of generalized a × a Gell-Mann matrices {Λi}, where a = 2v, and v is the number of qubits involved in the unitary55: \(U = {\mathrm{exp}} \left( { - i\mathop {\sum}\limits_j {c_j} {\it{\Lambda }}_j} \right)\).

This parameterization is used directly for the unitaries in the convolution layers C2C4, the pooling layer and the fully connected layer. For the first convolution layer C1, we restrict the choice of U1 to a product of six two-qubit unitaries between each possible pair of qubits: U1 = U(23)U(24)U(13)U(14)U(12)U(34), where U(αβ) is a two-qubit unitary acting on qubits indexed by α and β. Such a decomposition is useful when considering experimental implementation.

In the QCNN learning procedure, all parameters cμ are set to random values between 0 and 2π for the unitaries {Ui, Vj, F}. In every iteration of gradient descent, we compute the derivative of the mean-squared error function (equation (1)) to first order with respect to each cμ by using the finite-difference method:

$$\frac{{\partial {\mathrm{MSE}}}}{{\partial c_\mu }} = \frac{1}{{2\epsilon }}\left( {{\mathrm{MSE}}(c_\mu + \epsilon ) - {\mathrm{MSE}}(c_\mu - \epsilon )} \right) + O(\epsilon ^2)$$
(11)

Each coefficient is thus updated as \(c_\mu \mapsto c_\mu - \eta \frac{{\partial {\mathrm{MSE}}}}{{\partial c_\mu }}\), where η is the learning rate for that iteration. We compute the learning rate using the bold driver technique from machine learning, where η is increased by 5% if the error has decreased from the previous iteration and decreased by 50% otherwise56. We repeat the gradient descent procedure until the error function changes on the order of 10−5 between successive iterations. In our simulations, we use \(\epsilon = 10^{ - 4}\) for the gradient computation and begin with an initial learning rate of η0 = 10.

Construction of the QCNN circuit

To construct the exact QCNN circuit in Fig. 2b, we follow the guidelines discussed in the main text. Specifically, we design the convolution and pooling layers to satisfy the following two important properties:

  1. 1.

    Fixed-point criterion. If the input is a cluster state |ψ0〉 of L spins, the output of the convolution–pooling layers is a cluster state |ψ0〉 of L/3 spins, with all measurements deterministically yielding |0〉.

  2. 2.

    QEC criterion. If the input is not |ψ0〉 but instead differs from |ψ0〉 at one site by an error that commutes with the global symmetry, the output should still be a cluster state of L/3 spins, but at least one of the measurements will result in the state |1〉.

These two properties are desirable for any quantum circuit implementation of RG flow for performing QPR.

In the specific case of our Hamiltonian, the ground state (1D cluster state) is a graph state, which can be efficiently obtained by applying a sequence of controlled-phase gates to a product state. This simplifies the construction of the MERA representation for the fixed-point criterion. To satisfy the QEC criterion, we treat the ground state manifold of the unperturbed Hamiltonian \(H = - J\mathop {\sum}\limits_i {Z_i} X_{i + 1}Z_{i + 2}\) as the code space of a stabilizer code with stabilizers {ZiXi+1Zi+2}. The remaining degrees of freedom in the QCNN convolution and pooling layers are then specified such that the circuit detects and corrects the error (that is, it measures at least one |1〉 and prevents propagation to the next layer) when a single-qubit X error is present.

QCNN for general QPR problems

Our interpretation of QCNNs in terms of MERA and QEC motivates their application for recognizing more generic quantum phases. For any quantum phase \({\cal{P}}\) whose RG fixed-point wavefunction \(\left| {\psi _0({\cal{P}})} \right\rangle\) has a tensor network representation in isometric or G-isometric form57 (Supplementary Fig. 3a), one can systematically construct a corresponding QCNN circuit. This family of quantum phases includes all 1D SPT and 2D string-net phases47,57,58. In these cases, one can explicitly construct a commuting parent Hamiltonian for \(\left| {\psi _0({\cal{P}})} \right\rangle\) and a MERA structure in which \(\left| {\psi _0({\cal{P}})} \right\rangle\) is a fixed-point wavefunction (Supplementary Fig. 3a for 1D systems). The diagrammatic proof of this fixed-point property is given in Supplementary Fig. 3b. Furthermore, any ‘local error’ perturbing an input state away from \(\left| {\psi _0({\cal{P}})} \right\rangle\) can be identified by measuring a fraction of terms in the parent Hamiltonian, similar to syndrome measurements in stabilizer-based QEC29. Then, a QCNN for \({\cal{P}}\) simply consists of the MERA for \(\left| {\psi _0({\cal{P}})} \right\rangle\) and a nested QEC scheme in which an input state with error density below the QEC threshold59 ‘flows’ to the RG fixed point. Such a QCNN can be optimized via our learning procedure.

While our generic learning protocol begins with completely random unitaries, as in the classical case1, this initialization may not be the most efficient for gradient descent. Instead, motivated by deep learning techniques such as pre-training1, a better initial parameterization would consist of a MERA representation of \(\left| {\psi _0({\cal{P}})} \right\rangle\) and one choice of nested QEC. With such an initialization, the learning procedure serves to optimize the QEC scheme, expanding its threshold to the target phase boundary (Supplementary Fig. 3c).

Experimental resource analysis

To compute the gate depth of the cluster model QCNN circuit in a Rydberg atom implementation, we analyse each gate shown in Fig. 2b. By postponing pooling layer measurements to the end of the circuit, the multi-qubit gates required are

$${\mathrm{C}}_z{\mathrm{Z}}_{ij} = {\mathrm{e}}^{i\pi ( - {\mathbf{1}} + Z_i)( - {\mathbf{1}} + Z_j)/4}$$
(12)
$${\mathrm{C}}_x{\mathrm{Z}}_{ij} = {\mathrm{e}}^{i\pi ( - {\mathbf{1}} + X_i)( - {\mathbf{1}} + Z_j)/4}$$
(13)
$${\mathrm{C}}_x{\mathrm{C}}_x{\mathrm{X}}_{ijk} = {\mathrm{e}}^{i\pi ( - {\mathbf{1}} + X_i)( - {\mathbf{1}} + X_j)( - {\mathbf{1}} + X_k)/8}$$
(14)

By using Rydberg blockade-mediated controlled gates60, it is straightforward to implement CzZij and \({\mathrm{C}}_z{\mathrm{C}}_z{\mathrm{Z}}_{ijk} = e^{i\pi ( - {\mathbf{1}} + Z_i)( - {\mathbf{1}} + Z_j)( - {\mathbf{1}} + Z_k)/8}\). The desired CxZij and CxCxXijk gates can then be obtained by conjugating CzZij and CzCzZijk by single-qubit rotations. For an input size of N spins, the kth convolution–pooling unit thus applies 4N/3k−1 CzZij gates, N/3k−1 CxCxXijk gates and 2N/3k−1 layers of CxZij gates. The depth of single-qubit rotations required is 4d, as these rotations can be implemented in parallel on all N qubits. Finally, the fully connected layer consists of N31−d CzZij gates. Thus, the total number of multi-qubit operations required for a QCNN of depth d operating on N spins is \(\frac{{7N}}{2}(1 - 3^{1 - d}) + N3^{1 - d}\). Note that we do not need to use SWAP gates since the Rydberg interaction is long-range.

Demonstration of learning procedure for QEC

To obtain the QEC code considered in the main text, we consider a QCNN with N = 9 input physical qubits and simulate the circuit evolution of its 2N × 2N density matrix exactly. Strictly speaking, our QCNN has three layers: a three-qubit convolution layer U1, a 3-to-1 pooling layer and a 3-to-1 fully connected layer U2. Without loss of generality, we may ignore the optimization over the pooling layer by absorbing its effect into the first convolution layer, leading to the effective two-layer structure shown in Fig. 5a. The generic three-qubit unitary operations U1 and U2 are parameterized using 63 Gell-Mann coefficients each.

As discussed in the main text, we consider three different error models: (1) independent single-qubit errors on all qubits with equal probabilities pμ for μ = X, Y and Z errors, (2) independent single-qubit errors on all qubits, with anisotropic probabilities px ≠ py = pz and (3) independent single-qubit anisotropic errors with additional two-qubit correlated errors XiXi+1 with probability pxx. More specifically, the first two error models are realized by applying a (generally anisotropic) depolarization quantum channel to each of the nine physical qubits:

$${\cal{N}}_{1,i}:\rho \mapsto \left (1 - \mathop {\sum}\limits_\mu {p_\mu } \right)\rho + \mathop {\sum}\limits_\mu {p_\mu } \sigma _i^\mu \rho \sigma _i^\mu$$
(15)

with Pauli matrices \({\sigma _{i}^{\mu}}\) for i {1, 2, …, 9} (the qubit indices are defined from bottom to top in Fig. 5a). For the anisotropic case, we trained the QCNN on various different error models with the same total error probability px + py + pz = 0.001 but different relative ratios; the resulting ratio between the logical error probability of the Shor code and that of the QCNN code is plotted as a function of anisotropy in Supplementary Fig. 4. For strongly anisotropic models, the QCNN outperforms the Shor code, while for nearly isotropic models, the Shor code is optimal and QCNN can achieve the same logical error rate.

For the correlated error model, we additionally apply a quantum channel:

$${\cal{N}}_{2,i}:\rho \mapsto (1 - p_{xx})\rho + p_{xx}X_iX_{i + 1}\rho X_iX_{i + 1}$$
(16)

for pairs of nearby qubits, that is i {1, 2, 4, 5, 7, 8}. Such a geometrically local correlation is motivated from experimental considerations. In this case, we train our QCNN circuit on a specific error model with parameter choices px = 5.8 × 10−3, py = pz = 2 × 10−3, pxx = 2 × 10−4 and evaluate the logical error probabilities for various physical error models with the same relative ratios but different total error per qubit px + py + pz + pxx. In general, for an anisotropic logical error model with probabilities pμ for σμ logical errors, the overlap f is \((1 - 2\mathop {\sum}\limits_\mu {p_\mu } /3)\), since \(\left\langle { \pm \nu } \right|\sigma _\mu \left| { \pm \nu } \right\rangle = ( - 1)^{\delta _{\mu ,\nu } + 1}\). Becuase of this, we compute the total logical error probability from f as 1.5(1 − f). Hence, our goal is to maximize the logical state overlap f defined in equation (5). If we naively apply the gradient descent method based on f directly to both U1 and U2, we find that the optimization is easily trapped in a local optimum. Instead, we optimize two unitaries U1 and U2 sequentially, similar to the layer-by-layer optimization in backpropagation for conventional CNN1.

A few remarks are in order. First, since U1 is optimized prior to U2, one needs to devise an efficient cost function C1 that is independent of U2. In particular, simply maximizing f with an assumption on U2, for example that it equals the identity, may not be ideal, since such choice does not capture a potential interplay between U1 and U2. Second, because U1 captures arbitrary single-qubit rotations, the definition of C1 should be basis-independent. Finally, we note that the tree structure of our circuit allows one to view the first layer as an independent quantum channel:

$${\cal{M}}_{U_1}:\rho \mapsto {\mathrm{tr}}_a\left[ {U_1{\cal{N}}\left( {U_1^\dagger \left( {\left| 0 \right\rangle \left\langle 0 \right| \otimes \rho \otimes \left| 0 \right\rangle \left\langle 0 \right|} \right)U_1} \right)U_1^\dagger } \right]$$
(17)

where tra[] denotes tracing over the ancilla qubits that are measured in the intermediate step. From this perspective, \({\cal{M}}_{U_1}\) describes an effective error model to be corrected by the second layer.

With these considerations, we optimize U1 such that the effective error model \({\cal{M}}_{U_1}\) becomes as classical as possible, that is \({\cal{M}}_{U_1}\) is dominated by a ‘flip’ error along a certain axis with a strongly suppressed ‘phase’ error. Only then will the remnant, simpler errors be corrected by the second layer. More specifically, one may represent \({\cal{M}}_{U_1}\) using a map \({\cal{M}}_{U_{1}}:{\mathbf{r}} \mapsto M{\mathbf{r}} + {\mathbf{c}}\), where \({\mathbf{r}} \in {\Bbb R}^3\) is the Bloch vector for a qubit state \(\rho \equiv \frac{1}{2}{\mathbf{1}} + {\mathbf{r}} \cdot {\it{\sigma }}\), where 1 is the identity operator and σ = (X, Y, Z) is the vector of Pauli matrices53. The singular values of the real matrix M encode the probabilities p1 ≥ p2 ≥ p3 for three different types of errors. We choose our cost function for the first layer as \(C_1 = p_1^2 + p_2 + p_3\), which is relatively more sensitive to p2 and p3 than p1 and ensure that the resultant, optimized channel \({\cal{M}}_{U_1}\) is dominated by one type of error (with probability p1). We note that M can be efficiently evaluated from a quantum device without knowing \({\cal{N}}\), by performing quantum process tomography for a single logical qubit. Once U1 is optimized, we use gradient descent to find an optimal U2 to maximize f. As with QPR, gradients are computed via the finite-difference method, and the learning rate is determined by the bold driver technique1.