Skip to main content

2018 | OriginalPaper | Buchkapitel

Taking Gradients Through Experiments: LSTMs and Memory Proximal Policy Optimization for Black-Box Quantum Control

verfasst von : Moritz August, José Miguel Hernández-Lobato

Erschienen in: High Performance Computing

Verlag: Springer International Publishing

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this work we introduce a general method to solve quantum control tasks as an interesting reinforcement learning problem not yet discussed in the machine learning community. We analyze the structure of the reinforcement learning problems typically arising in quantum physics and argue that agents parameterized by long short-term memory (LSTM) networks trained via stochastic policy gradients yield a versatile method to solving them. In this context we introduce a variant of the proximal policy optimization (PPO) algorithm called the memory proximal policy optimization (MPPO) which is based on the previous analysis. We argue that our method can by design be easily combined with numerical simulations as well as real experiments providing the reward signal. We demonstrate how the method can incorporate physical domain knowledge and present results of numerical experiments showing that it achieves state-of-the-art performance for several learning tasks in quantum control with discrete and continuous control parameters.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literatur
1.
Zurück zum Zitat August, M., Ni, X.: Using recurrent neural networks to optimize dynamical decoupling for quantum memory. Phys. Rev. A 95(1), 012335 (2017)CrossRef August, M., Ni, X.: Using recurrent neural networks to optimize dynamical decoupling for quantum memory. Phys. Rev. A 95(1), 012335 (2017)CrossRef
2.
Zurück zum Zitat Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195–202 (2017)CrossRef Biamonte, J., Wittek, P., Pancotti, N., Rebentrost, P., Wiebe, N., Lloyd, S.: Quantum machine learning. Nature 549(7671), 195–202 (2017)CrossRef
3.
Zurück zum Zitat Bukov, M., Day, A.G., Sels, D., Weinberg, P., Polkovnikov, A., Mehta, P.: Machine learning meets quantum state preparation. the phase diagram of quantum control. arXiv preprint arXiv:1705.00565 (2017) Bukov, M., Day, A.G., Sels, D., Weinberg, P., Polkovnikov, A., Mehta, P.: Machine learning meets quantum state preparation. the phase diagram of quantum control. arXiv preprint arXiv:​1705.​00565 (2017)
4.
Zurück zum Zitat Caneva, T., Calarco, T., Montangero, S.: Chopped random-basis quantum optimization. Phys. Rev. A 84(2), 022326 (2011)CrossRef Caneva, T., Calarco, T., Montangero, S.: Chopped random-basis quantum optimization. Phys. Rev. A 84(2), 022326 (2011)CrossRef
5.
Zurück zum Zitat Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:1406.1078 (2014) Cho, K., et al.: Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv preprint arXiv:​1406.​1078 (2014)
6.
Zurück zum Zitat Cohen, C., Tannoudji, B.D., Laloë, F.: Quantum Mechanics, vol. i and ii. Hermann and Wiley, Paris and Hoboken (1977) Cohen, C., Tannoudji, B.D., Laloë, F.: Quantum Mechanics, vol. i and ii. Hermann and Wiley, Paris and Hoboken (1977)
8.
Zurück zum Zitat Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef
9.
Zurück zum Zitat Khaneja, N., Reiss, T., Kehlet, C., Schulte-Herbrüggen, T., Glaser, S.J.: Optimal control of coupled spin dynamics: design of nmr pulse sequences by gradient ascent algorithms. J. Magn. Reson. 172(2), 296–305 (2005)CrossRef Khaneja, N., Reiss, T., Kehlet, C., Schulte-Herbrüggen, T., Glaser, S.J.: Optimal control of coupled spin dynamics: design of nmr pulse sequences by gradient ascent algorithms. J. Magn. Reson. 172(2), 296–305 (2005)CrossRef
11.
Zurück zum Zitat Melnikov, A.A., et al.: Active learning machine learns to create new quantum experiments. In: Proceedings of the National Academy of Sciences, p. 201714936 (2018) Melnikov, A.A., et al.: Active learning machine learns to create new quantum experiments. In: Proceedings of the National Academy of Sciences, p. 201714936 (2018)
12.
Zurück zum Zitat Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016) Mnih, V., et al.: Asynchronous methods for deep reinforcement learning. In: International Conference on Machine Learning, pp. 1928–1937 (2016)
13.
Zurück zum Zitat Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRef Mnih, V., et al.: Human-level control through deep reinforcement learning. Nature 518(7540), 529 (2015)CrossRef
14.
Zurück zum Zitat Nielsen, M.A., Chuang, I.: Quantum computation and quantum information (2002) Nielsen, M.A., Chuang, I.: Quantum computation and quantum information (2002)
15.
Zurück zum Zitat Palittapongarnpim, P., Wittek, P., Zahedinejad, E., Vedaie, S., Sanders, B.C.: Learning in quantum control: high-dimensional global optimization for noisy quantum dynamics. Neurocomputing 268, 116–126 (2017)CrossRef Palittapongarnpim, P., Wittek, P., Zahedinejad, E., Vedaie, S., Sanders, B.C.: Learning in quantum control: high-dimensional global optimization for noisy quantum dynamics. Neurocomputing 268, 116–126 (2017)CrossRef
17.
Zurück zum Zitat Robbins, H.: Some aspects of the sequential design of experiments. In: Lai, T.L., Siegmund, D. (eds.) Herbert Robbins Selected Papers, pp. 169–177. Springer, Newyork (1985)CrossRef Robbins, H.: Some aspects of the sequential design of experiments. In: Lai, T.L., Siegmund, D. (eds.) Herbert Robbins Selected Papers, pp. 169–177. Springer, Newyork (1985)CrossRef
18.
Zurück zum Zitat Sakurai, J.J., Commins, E.D.: Modern Quantum Mechanics, Revised edn. AAPT, College Park (1995) Sakurai, J.J., Commins, E.D.: Modern Quantum Mechanics, Revised edn. AAPT, College Park (1995)
19.
Zurück zum Zitat Schollwöck, U.: The density-matrix renormalization group in the age of matrix product states. Ann. Phys. 326(1), 96–192 (2011)MathSciNetCrossRef Schollwöck, U.: The density-matrix renormalization group in the age of matrix product states. Ann. Phys. 326(1), 96–192 (2011)MathSciNetCrossRef
20.
Zurück zum Zitat Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015) Schulman, J., Levine, S., Abbeel, P., Jordan, M., Moritz, P.: Trust region policy optimization. In: International Conference on Machine Learning, pp. 1889–1897 (2015)
21.
Zurück zum Zitat Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347 (2017) Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms. arXiv preprint arXiv:​1707.​06347 (2017)
22.
Zurück zum Zitat Silver, D., et al.: Mastering chess and Shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:1712.01815 (2017) Silver, D., et al.: Mastering chess and Shogi by self-play with a general reinforcement learning algorithm. arXiv preprint arXiv:​1712.​01815 (2017)
23.
Zurück zum Zitat Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017)CrossRef Silver, D., et al.: Mastering the game of Go without human knowledge. Nature 550(7676), 354–359 (2017)CrossRef
26.
Zurück zum Zitat Wigley, P.B., et al.: Fast machine-learning online optimization of ultra-cold-atom experiments. Sci. Rep. 6, 25890 (2016)CrossRef Wigley, P.B., et al.: Fast machine-learning online optimization of ultra-cold-atom experiments. Sci. Rep. 6, 25890 (2016)CrossRef
Metadaten
Titel
Taking Gradients Through Experiments: LSTMs and Memory Proximal Policy Optimization for Black-Box Quantum Control
verfasst von
Moritz August
José Miguel Hernández-Lobato
Copyright-Jahr
2018
DOI
https://doi.org/10.1007/978-3-030-02465-9_43