2014 | OriginalPaper | Buchkapitel
Tipp
Weitere Kapitel dieses Buchs durch Wischen aufrufen
Erschienen in:
Growing Adaptive Machines
We propose a theory that relates difficulty of learning in deep architectures to culture and language. It is articulated around the following hypotheses: (1) learning in an individual human brain is hampered by the presence of effective local minima; (2) this optimization difficulty is particularly important when it comes to learning higher-level abstractions, i.e., concepts that cover a vast and highly-nonlinear span of sensory configurations; (3) such high-level abstractions are best represented in brains by the composition of many levels of representation, i.e., by deep architectures; (4) a human brain can learn such high-level abstractions if guided by the signals produced by other humans, which act as hints or indirect supervision for these high-level abstractions; and (5), language and the recombination and optimization of mental concepts provide an efficient evolutionary recombination operator, and this gives rise to rapid search in the space of communicable ideas that help humans build up better high-level internal representations of their world. These hypotheses put together imply that human culture and the evolution of ideas have been crucial to counter an optimization difficulty: this optimization difficulty would otherwise make it very difficult for human brains to capture high-level knowledge of the world. The theory is grounded in experimental observations of the difficulties of training deep artificial neural networks. Plausible consequences of this theory for the efficiency of cultural evolution are sketched.
Bitte loggen Sie sich ein, um Zugang zu diesem Inhalt zu erhalten
Sie möchten Zugang zu diesem Inhalt erhalten? Dann informieren Sie sich jetzt über unsere Produkte:
Anzeige
1.
D.H. Ackley, G.E. Hinton, T.J. Sejnowski, A learning algorithm for Boltzmann machines. Cogn. Sci.
9, 147–169 (1985)
CrossRef
2.
M.A. Arbib,
The Handbook of Brain Theory and Neural Networks (MIT Press, Cambridge, 1995)
3.
Y. Bengio, Learning deep architectures for AI. Found. Trends Mach. Lear.
2(1), 1–127 2009. Also published as a book. Now Publishers, 2009
4.
Y. Bengio, O. Delalleau, On the expressive power of deep architectures, in
Proceedings of the 22nd International Conference on Algorithmic Learning Theory, 2011, ed. by J. Kivinen, C. Szepesvári, E. Ukkonen, T. Zeugmann
5.
Y. Bengio, O. Delalleau, C. Simard, Decision trees do not generalize to new variations. Comput. Intell.
26(4), 449–467 (2010)
6.
Y. Bengio, P. Lamblin, D. Popovici, H. Larochelle, Greedy layer-wise training of deep networks, in
Advances in Neural Information Processing Systems 19 (
NIPS’06), ed. by B. Schölkopf, J. Platt, T. Hoffman (MIT Press, Cambridge, 2007), pp. 153–160
7.
Y. Bengio, Y. LeCun, Scaling learning algorithms towards AI. in
Large Scale Kernel Machines, ed. by L. Bottou, O. Chapelle, D. DeCoste, J. Weston (MIT Press, Cambridge, 2007)
8.
Y. Bengio, J. Louradour, R. Collobert, J. Weston, Curriculum learning, in
Proceedings of the Twenty-sixth International Conference on Machine Learning (
ICML’09), ed. by L. Bottou, M. Littman (ACM, 2009)
9.
L. Bottou, Stochastic learning, in
Advanced Lectures on Machine Learning, number LNAI 3176 in Lecture notes in artificial intelligence, ed. by O. Bousquet, U. von Luxburg (Springer, Berlin, 2004), pp. 146–168
CrossRef
10.
M.A. Carreira-Perpiñan, G.E. Hinton, On contrastive divergence learning, in
Proceedings of the Tenth International Workshop on Artificial Intelligence and Statistics (
AISTATS’05), ed. by R.G. Cowell, Z. Ghahramani (Society for Artificial Intelligence and Statistics, 2005) pp. 33–40.
11.
R. Caruana, Multitask connectionist learning, in
Proceedings of the 1993 Connectionist Models Summer School, 1993, pp. 372–379
12.
R. Dawkins,
The Selfish Gene (Oxford University Press, London, 1976)
13.
K. Distin,
The Selfish Meme (Cambridge University Press, London, 2005)
14.
J.L. Elman, Learning and development in neural networks: the importance of starting small. Cognition
48, 781–799 (1993)
CrossRef
15.
D. Erhan, Y. Bengio, A. Courville, P.-A. Manzagol, P. Vincent, S. Bengio, Why does unsupervised pre-training help deep learning? J. Mach. Lear. Res.
11, 625–660 (2010)
16.
J. Håstad, Almost optimal lower bounds for small depth circuits, in
Proceedings of the 18th annual ACM Symposium on Theory of Computing (ACM Press, Berkeley, 1986), pp. 6–20
17.
18.
G.E. Hinton, T.J. Sejnowski, D.H. Ackley, Boltzmann machines: constraint satisfaction networks that learn. Technical Report TR-CMU-CS-84-119, (Dept. of Computer Science, Carnegie-Mellon University, 1984)
19.
G.E. Hinton, Learning distributed representations of concepts, in
Proceedings of the Eighth Annual Conference of the Cognitive Science Society (Lawrence Erlbaum, Hillsdale, Amherst 1986, 1986), pp. 1–12
20.
G.E. Hinton, Connectionist learning procedures. Artif. Intell.
40, 185–234 (1989)
CrossRef
21.
G.E. Hinton, S.J. Nowlan, How learning can guide evolution. Complex Syst.
1, 495–502 (1989)
22.
G.E. Hinton, S. Osindero, Y.W. Teh, A fast learning algorithm for deep belief nets. Neural Comput.
18, 1527–1554 (2006)
CrossRefMATHMathSciNet
23.
G.E. Hinton, R. Salakhutdinov, Reducing the dimensionality of data with neural networks. Science
313(5786), 504–507 (2006)
24.
J.H. Holland,
Adaptation in Natural and Artificial Systems (University of Michigan Press, Ann Arbor, 1975)
25.
E. Hutchins, B. Hazlehurst, How to invent a lexicon: the development of shared symbols in interaction, in
Artificial Societies: The Computer Simulation of Social Life, ed. by N. Gilbert, R. Conte (UCL Press, London, 1995), pp. 157–189
26.
E. Hutchins, B. Hazlehurst, Auto-organization and emergence of shared language structure, in
Simulating the Evolution of Language, ed. by A. Cangelosi, D. Parisi (Springer, London, 2002), pp. 279–305
CrossRef
27.
K. Jarrett, K. Kavukcuoglu, M. Ranzato, Y. LeCun, What is the best multi-stage architecture for object recognition? in
Proceedings of IEEE International Conference on Computer Vision (
ICCV’09), 2009, pp. 2146–2153
28.
F. Khan, X. Zhu, B. Mutlu, How do humans teach: on curriculum learning and teaching dimension, in
Advances in Neural Information Processing Systems 24 (
NIPS’11), 2011 pp. 1449–1457
29.
K.A. Krueger, P. Dayan, Flexible shaping: how learning in small steps helps. Cognition
110, 380–394 (2009)
CrossRef
30.
H. Larochelle, Y. Bengio, J. Louradour, P. Lamblin, Exploring strategies for training deep neural networks. J. Mach. Lear. Res.
10, 1–40 (2009)
MATH
31.
H. Lee, R. Grosse, R. Ranganath, A.Y. Ng, Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations, in
Proceedings of the Twenty-sixth International Conference on Machine Learning (ICML’09), ed. by L. Bottou, M. Littman (ACM, Montreal (Qc), Canada, 2009)
32.
J. Martens. Deep learning via Hessian-free optimization, in
Proceedings of the Twenty-seventh International Conference on Machine Learning (
ICML-10), ed. by L. Bottou, M. Littman (ACM, 2010) pp. 735–742
33.
E. Moritz, Memetic science: I–general introduction. J. Ideas
1, 1–23 (1990)
34.
G.B. Peterson, A day of great illumination: B. F. Skinner’s discovery of shaping. J. Exp. Anal. Behav.
82(3), 317–328 (2004)
CrossRef
35.
R. Raina, A. Battle, H. Lee, B. Packer, A.Y. Ng, Self-taught learning: transfer learning from unlabeled data, in
Proceedings of the Twenty-fourth International Conference on Machine Learning (
ICML’07), ed. by Z. Ghahramani (ACM, 2007), pp. 759–766
36.
M. Ranzato, C. Poultney, S. Chopra, Y. LeCun, Efficient learning of sparse representations with an energy-based model, in
Advances in Neural Information Processing Systems 19 (
NIPS’06), ed. by B. Schölkopf, J. Platt, T. Hoffman (MIT Press, 2007) pp. 1137–1144
37.
D.E. Rumelhart, J.L. McClelland, and the PDP Research Group
Parallel Distributed Processing Explorations in the Microstructure of Cognition, (MIT Press, Cambridge, 1986)
38.
R. Salakhutdinov, G.E. Hinton, Deep Boltzmann machines, in
Proceedings of the Twelfth International Conference on Artificial Intelligence and Statistics (
AISTATS 2009), vol. 8, 2009
39.
R. Salakhutdinov, G.E. Hinton, Deep Boltzmann machines. in
Proceedings of The Twelfth International Conference on Artificial Intelligence and Statistics (
AISTATS’09), vol. 5, 2009, pp. 448–455
40.
R. Salakhutdinov, H. Larochelle, Efficient learning of deep Boltzmann machines, in
Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics (
AISTATS 2010), JMLR W&CP, vol. 9, 2010, pp. 693–700
41.
B.F. Skinner, Reinforcement today. Am. Psychol.
13, 94–99 (1958)
CrossRef
42.
F. Subiaul, J. Cantlon, R.L. Holloway, H.S. Terrace, Cognitive imitation in rhesus macaques. Science
305(5682), 407–410 (2004)
CrossRef
43.
R. Sutton, A. Barto,
Reinforcement Learning: An Introduction (MIT Press, Cambridge, 1998)
44.
J. Tenenbaum, V. de Silva, J.C. Langford, A global geometric framework for nonlinear dimensionality reduction. Science
290(5500), 2319–2323 (2000)
45.
L. van der Maaten, G.E. Hinton, Visualizing data using t-sne. J. Mach. Learn. Res.
9, 2579–2605 (2008)
46.
J. Weston, F. Ratle, R. Collobert, Deep learning via semi-supervised embedding, in
Proceedings of the Twenty-fifth International Conference on Machine Learning (ICML’08), ed. by W.W. Cohen, A. McCallum, S.T. Roweis (ACM, New York, NY, USA, 2008), pp. 1168–1175
47.
A. Yao, Separating the polynomial-time hierarchy by oracles, in
Proceedings of the 26th Annual IEEE Symposium on Foundations of Computer Science, 1985, pp. 1–10
48.
A.L. Yuille, The convergence of contrastive divergences, in
Advances in Neural Information Processing Systems 17 (
NIPS’04), ed. by L.K. Saul, Y. Weiss, L. Bottou (MIT Press, 2005) pp. 1593–1600
- Titel
- Evolving Culture Versus Local Minima
- DOI
- https://doi.org/10.1007/978-3-642-55337-0_3
- Autor:
-
Yoshua Bengio
- Verlag
- Springer Berlin Heidelberg
- Sequenznummer
- 3
- Kapitelnummer
- Chapter 3