Skip to main content
Top

2017 | OriginalPaper | Chapter

Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

Authors : Vivek Veeriah, Shangtong Zhang, Richard S. Sutton

Published in: Machine Learning and Knowledge Discovery in Databases

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config
loading …

Abstract

Representations are fundamental to artificial intelligence. The performance of a learning system depends on how the data is represented. Typically, these representations are hand-engineered using domain knowledge. Recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. Learning representations directly from the incoming data stream reduces human labour involved in designing a learning system. More importantly, this allows in scaling up a learning system to difficult tasks. In this paper, we introduce a new incremental learning algorithm called crossprop, that learns incoming weights of hidden units based on the meta-gradient descent approach. This meta-gradient descent approach was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes. The final update equation introduces an additional memory parameter for each of these weights and generalizes the backprop update equation. From our empirical experiments, we show that crossprop learns and reuses its feature representation while tackling new and unseen tasks whereas backprop relearns a new feature representation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Literature
go back to reference Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473 Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:​1409.​0473
go back to reference Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. (JAIR) 47, 253–279 (2013) Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. (JAIR) 47, 253–279 (2013)
go back to reference Cho, K., Van Merrinboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078 Cho, K., Van Merrinboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:​1406.​1078
go back to reference Comon, P.: Independent component analysis, a new concept? Sig. process. 36(3), 287–314 (1994)CrossRefMATH Comon, P.: Independent component analysis, a new concept? Sig. process. 36(3), 287–314 (1994)CrossRefMATH
go back to reference Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE, June 2009 Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE, June 2009
go back to reference Jacobs, R.A.: Increased rates of convergence through learning rate adaptation. Neural Netw. 1(4), 295–307 (1988)CrossRef Jacobs, R.A.: Increased rates of convergence through learning rate adaptation. Neural Netw. 1(4), 295–307 (1988)CrossRef
go back to reference Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D.: Overcoming catastrophic forgetting in neural networks. Proc. Nat. Acad. Sci. 201611835 (2017) Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D.: Overcoming catastrophic forgetting in neural networks. Proc. Nat. Acad. Sci. 201611835 (2017)
go back to reference Klopf, A., Gose, E.: An evolutionary pattern recognition network. IEEE Trans. Syst. Sci. Cybern. 5(3), 247–250 (1969)CrossRef Klopf, A., Gose, E.: An evolutionary pattern recognition network. IEEE Trans. Syst. Sci. Cybern. 5(3), 247–250 (1969)CrossRef
go back to reference LeCun, Y., Cortes, C., Burges., C.: The MNIST database of handwritten digits (1988) LeCun, Y., Cortes, C., Burges., C.: The MNIST database of handwritten digits (1988)
go back to reference Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)MathSciNetMATH Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)MathSciNetMATH
go back to reference Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)MATH Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)MATH
go back to reference Mahmood, A.R., Sutton, R.S.: Representation search through generate and test. In: AAAI Workshop, Learning Rich Representations from Low-Level Sensors, June 2013 Mahmood, A.R., Sutton, R.S.: Representation search through generate and test. In: AAAI Workshop, Learning Rich Representations from Low-Level Sensors, June 2013
go back to reference Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports 6, 26094 (2016) Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports 6, 26094 (2016)
go back to reference Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef
go back to reference Moravčík, M., Schmid, M., Burch, N., Lis, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., Bowling, M.: Deepstack: Expert-level artificial intelligence in no-limit poker (2017). arXiv preprint arXiv:1701.01724 Moravčík, M., Schmid, M., Burch, N., Lis, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., Bowling, M.: Deepstack: Expert-level artificial intelligence in no-limit poker (2017). arXiv preprint arXiv:​1701.​01724
go back to reference Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37(23), 3311–3325 (1997)CrossRef Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37(23), 3311–3325 (1997)CrossRef
go back to reference Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE, March 2016 Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE, March 2016
go back to reference Ring, M.B.: CHILD: a first step towards continual learning. Mach. Learn. 28(1), 77–104 (1997) Ring, M.B.: CHILD: a first step towards continual learning. Mach. Learn. 28(1), 77–104 (1997)
go back to reference Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)MATH Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)MATH
go back to reference Schraudolph, N.N.: Local gain adaptation in stochastic gradient descent (1999) Schraudolph, N.N.: Local gain adaptation in stochastic gradient descent (1999)
go back to reference Sironi, A., Tekin, B., Rigamonti, R., Lepetit, V., Fua, P.: Learning separable filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 94–106 (2015)CrossRef Sironi, A., Tekin, B., Rigamonti, R., Lepetit, V., Fua, P.: Learning separable filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 94–106 (2015)CrossRef
go back to reference Sutton, R.S.: Two problems with backpropagation and other steepest-descent learning procedures for networks. In: Proceeding of 8th Annual Conference on Cognitive Science Society, pp. 823–831. Erlbaum, May 1986 Sutton, R.S.: Two problems with backpropagation and other steepest-descent learning procedures for networks. In: Proceeding of 8th Annual Conference on Cognitive Science Society, pp. 823–831. Erlbaum, May 1986
go back to reference Sutton, R.S.: Adapting bias by gradient descent: an incremental version of delta-bar-delta. In: AAAI, pp. 171–176, July 1992 Sutton, R.S.: Adapting bias by gradient descent: an incremental version of delta-bar-delta. In: AAAI, pp. 171–176, July 1992
go back to reference Sutton, R.S.: Myths of representation learning. In: ICLR (2014). Lecture Sutton, R.S.: Myths of representation learning. In: ICLR (2014). Lecture
go back to reference Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI, pp. 4278–4284 (2017) Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI, pp. 4278–4284 (2017)
go back to reference Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012) Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)
go back to reference Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)MathSciNetMATH Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)MathSciNetMATH
go back to reference Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., ... Klingner, J.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). arXiv preprint arXiv:1609.08144 Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., ... Klingner, J.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). arXiv preprint arXiv:​1609.​08144
Metadata
Title
Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks
Authors
Vivek Veeriah
Shangtong Zhang
Richard S. Sutton
Copyright Year
2017
DOI
https://doi.org/10.1007/978-3-319-71249-9_27

Premium Partner