Top

Published in:

2017 | OriginalPaper | Chapter

Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks

Authors : Vivek Veeriah, Shangtong Zhang, Richard S. Sutton

Published in: Machine Learning and Knowledge Discovery in Databases

Publisher: Springer International Publishing

Activate our intelligent search to find suitable subject content or patents.

search-config

AI-assisted search

Off

Abstract

Representations are fundamental to artificial intelligence. The performance of a learning system depends on how the data is represented. Typically, these representations are hand-engineered using domain knowledge. Recently, the trend is to learn these representations through stochastic gradient descent in multi-layer neural networks, which is called backprop. Learning representations directly from the incoming data stream reduces human labour involved in designing a learning system. More importantly, this allows in scaling up a learning system to difficult tasks. In this paper, we introduce a new incremental learning algorithm called crossprop, that learns incoming weights of hidden units based on the meta-gradient descent approach. This meta-gradient descent approach was previously introduced by Sutton (1992) and Schraudolph (1999) for learning step-sizes. The final update equation introduces an additional memory parameter for each of these weights and generalizes the backprop update equation. From our empirical experiments, we show that crossprop learns and reuses its feature representation while tackling new and unseen tasks whereas backprop relearns a new feature representation.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

inform now

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

inform now

previous chapter Nyström Sketches

next chapter Distributed Stochastic Optimization of Regularized Risk via Saddle-Point Problem

https://github.com/ShangtongZhang/Crossprop.

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate (2014). arXiv preprint arXiv:1409.0473

Bellemare, M.G., Naddaf, Y., Veness, J., Bowling, M.: The arcade learning environment: an evaluation platform for general agents. J. Artif. Intell. Res. (JAIR) 47, 253–279 (2013)

Cho, K., Van Merrinboer, B., Gulcehre, C., Bahdanau, D., Bougares, F., Schwenk, H., Bengio, Y.: Learning phrase representations using RNN encoder-decoder for statistical machine translation (2014). arXiv preprint arXiv:1406.1078

Coates, A., Ng, A.Y.: Learning feature representations with K-means. In: Montavon, G., Orr, G.B., Müller, K.-R. (eds.) Neural Networks: Tricks of the Trade. LNCS, vol. 7700, pp. 561–580. Springer, Heidelberg (2012). https://doi.org/10.1007/978-3-642-35289-8_30 CrossRef

Comon, P.: Independent component analysis, a new concept? Sig. process. 36(3), 287–314 (1994)CrossRefMATH

Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, pp. 248–255. IEEE, June 2009

Jacobs, R.A.: Increased rates of convergence through learning rate adaptation. Neural Netw. 1(4), 295–307 (1988)CrossRef

Kingma, D., Ba, J.: Adam: a method for stochastic optimization (2014). arXiv preprint arXiv:1412.6980

Kirkpatrick, J., Pascanu, R., Rabinowitz, N., Veness, J., Desjardins, G., Rusu, A.A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D.: Overcoming catastrophic forgetting in neural networks. Proc. Nat. Acad. Sci. 201611835 (2017)

Klopf, A., Gose, E.: An evolutionary pattern recognition network. IEEE Trans. Syst. Sci. Cybern. 5(3), 247–250 (1969)CrossRef

LeCun, Y., Cortes, C., Burges., C.: The MNIST database of handwritten digits (1988)

Levine, S., Finn, C., Darrell, T., Abbeel, P.: End-to-end training of deep visuomotor policies. J. Mach. Learn. Res. 17(39), 1–40 (2016)MathSciNetMATH

Maaten, L.V.D., Hinton, G.: Visualizing data using t-SNE. J. Mach. Learn. Res. 9(Nov), 2579–2605 (2008)MATH

Mahmood, A.R., Sutton, R.S.: Representation search through generate and test. In: AAAI Workshop, Learning Rich Representations from Low-Level Sensors, June 2013

Miotto, R., Li, L., Kidd, B.A., Dudley, J.T.: Deep patient: an unsupervised representation to predict the future of patients from the electronic health records. Scientific reports 6, 26094 (2016)

Mnih, V., Kavukcuoglu, K., Silver, D., Rusu, A.A., Veness, J., Bellemare, M.G., Graves, A., Riedmiller, M., Fidjeland, A.K., Ostrovski, G., Petersen, S.: Human-level control through deep reinforcement learning. Nature 518(7540), 529–533 (2015)CrossRef

Moravčík, M., Schmid, M., Burch, N., Lis, V., Morrill, D., Bard, N., Davis, T., Waugh, K., Johanson, M., Bowling, M.: Deepstack: Expert-level artificial intelligence in no-limit poker (2017). arXiv preprint arXiv:1701.01724

Olshausen, B.A., Field, D.J.: Sparse coding with an overcomplete basis set: a strategy employed by V1? Vis. Res. 37(23), 3311–3325 (1997)CrossRef

Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. IEEE, March 2016

Ring, M.B.: CHILD: a first step towards continual learning. Mach. Learn. 28(1), 77–104 (1997)

Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Cogn. Model. 5(3), 1 (1988)MATH

Schraudolph, N.N.: Local gain adaptation in stochastic gradient descent (1999)

Sironi, A., Tekin, B., Rigamonti, R., Lepetit, V., Fua, P.: Learning separable filters. IEEE Trans. Pattern Anal. Mach. Intell. 37(1), 94–106 (2015)CrossRef

Sutton, R.S.: Two problems with backpropagation and other steepest-descent learning procedures for networks. In: Proceeding of 8th Annual Conference on Cognitive Science Society, pp. 823–831. Erlbaum, May 1986

Sutton, R.S.: Adapting bias by gradient descent: an incremental version of delta-bar-delta. In: AAAI, pp. 171–176, July 1992

Sutton, R.S.: Myths of representation learning. In: ICLR (2014). Lecture

Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: AAAI, pp. 4278–4284 (2017)

Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: divide the gradient by a running average of its recent magnitude. COURSERA: Neural Netw. Mach. Learn. 4(2), 26–31 (2012)

Vincent, P., Larochelle, H., Lajoie, I., Bengio, Y., Manzagol, P.A.: Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J. Mach. Learn. Res. 11(Dec), 3371–3408 (2010)MathSciNetMATH

Wu, Y., Schuster, M., Chen, Z., Le, Q.V., Norouzi, M., Macherey, W., ... Klingner, J.: Google’s neural machine translation system: bridging the gap between human and machine translation (2016). arXiv preprint arXiv:1609.08144

Zeiler, M.D.: ADADELTA: an adaptive learning rate method (2012). arXiv preprint arXiv:1212.5701

Title: Crossprop: Learning Representations by Stochastic Meta-Gradient Descent in Neural Networks
Authors: Vivek Veeriah
Shangtong Zhang
Richard S. Sutton
Publisher: Springer International Publishing
Book: Machine Learning and Knowledge Discovery in Databases
Print ISBN: 978-3-319-71248-2

Electronic ISBN: 978-3-319-71249-9

Copyright Year: 2017
DOI: https://doi.org/10.1007/978-3-319-71249-9_27

Springer Professional

Abstract

Please log in to get access to your license.

Dont have a licence yet? Then find out more about our products and how to get one now:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner