Proposed in the 1940s as a simplified model of the elementary computing unit in the human cortex, artificial neural networks (ANNs) have since been an active research area. Among the many evolutions of ANN, deep neural networks (DNNs) (Hinton, Osindero, and Teh 2006) stand out as a promising extension of the shallow ANN structure. The best demonstration thus far of hierarchical learning based on DNN, along with other Bayesian inference and deduction reasoning techniques, has been the performance of the IBM supercomputer Watson in the legendary tournament on the game show Jeopardy!, in 2011.
This chapter starts with some basic introductory information about ANN then outlines the DNN structure and learning scheme.
References
Aleksandrovsky, Boris, James Whitson, Gretchen Andes, Gary Lynch, and Richard Granger. “Novel Speech Processing Mechanism Derived from Auditory Neocortical Circuit Analysis.” In Proceedings of the Fourth International Conference on Spoken Language, edited by H. Timothy Bunnell and William Idsardi, 558–561. Piscataway, NJ: Institute of Electrical and Electronics Engineers, 1996.
Arnold, Ludovic, Sébastien Rebecchi, Sylvain Chevallier, and Hélène Paugam-Moisy. “An Introduction to Deep Learning.” In Proceedings of the 19th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning, Bruges, Belgium, April 27–29, 2011, edited by Michel Verleysen, 477–488. Leuven, Belgium: Ciaco, 2011.
Bengio, Yoshua. “Learning Deep Architectures for AI.” In Foundations and Trends in Machine Learning 2, no. 1 (2009): 1–127.
Bengio, Yoshua. “Deep Learning of Representations for Unsupervised and Transfer Learning.” In
ICML 2011: Proceedings of the International Conference on Machine Learning Unsupervised and Transfer Learning Workshop, edited by Isabelle Guyon, Gideon Dror, Vincent Lemaire, Graham Taylor, and Daniel Silver, 17–36. 2012.
http://jmlr.csail.mit.edu/proceedings/papers/v27/bengio12a/bengio12a.pdf
.
Bengio, Yoshua, and Olivier Delalleau. “On the Expressive Power of Deep Architectures.” In Algorithmic Learning Theory, edited by Jyrki Kivinen, Csaba Szepesvári, Esko Ukkonen, and Thomas Zeugmann, 18–36. Berlin: Springer, 2011.
Bengio, Yoshua, Pascal Lamblin, Dan Popovici, and Hugo Larochelle. “Greedy Layer-Wise Training of Deep Networks.” In NIPS ’06: Proceedings of Advances in Neural Information Processing Systems 19, edited by Bernhard Schlkopf, John Platt, and Thomas Hofmann, 153–160. Cambridge, MA: Massachusetts Institute of Technology Press, 2007.
Bengio, Yoshua, Jérôme Louradour, Ronan Collobert, and Jason Weston. “Curriculum Learning.” In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, edited by Léon Bottou and Michael Littman, 41–48. New York: ACM, 2009.
Brown, Thomas H., Edward W. Kairiss, and Claude L. Keenan. “Hebbian Synapses: Biophysical Mechanisms ad Algorithms.”Annual Review of Neuroscience 13, no. 1 (1990): 475–511.
Cai, Xianggao, Zhanpeng Xu, Guoming Lai, Chengwei Wu, and Xiaola Lin. “GPU-Accelerated Restricted Boltzmann Machine for Collaborative Filtering.” In Algorithms and Architectures for Parallel Processing: Proceedings of the 12th International ICA3PP Conference, Fukuoka, Japan, September 2012, edited by Yang Xiang, Ivan Stojmenovic´, Bernady O. Apduhan, Guojun Wang, Koji Nakano, and Albert Zomaya, 303–316. Berlin: Springer, 2012.
Ciresan, Dan Claudiu, Ueli Meier, Luca Maria Gambardella, and Jürgen Schmidhuber. “Deep, Big, Simple Neural Nets for Handwritten Digit Recognition.”Neural Computation 22, no. 12 (2010): 3207–3220.
Ciresan, Dan Claudiu, Ueli Meier, and Jürgen Schmidhuber. “Transfer Learning for Latin and Chinese Characters with Deep Neural Networks.” In Proceedings of the 2012 International Joint Conference on Neural Networks, 1–6. Piscataway, NJ: Institute of Electrical and Electronics Engineers, 2012.
Collobert, Ronan, and Jason Weston. “A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning.” In ICML ’08: Proceedings of the 25th International Conference on Machine Learning, edited by Andrew McCallum and Sam Roweis, 160–167. New York: ACM, 2008.
Dahl, George E., Dong Yu, Li Deng, and Alex Acero. “Context-Dependent Pre-Trained Deep Neural Networks for Large-Vocabulary Speech Recognition.” IEEE Transactions on Audio, Speech, and Language Processing 20, no. 1 (2012): 30–42.
Deng, Li, Brian Hutchinson, and Dong Yu. “Parallel Training for Deep Stacking Networks.” In
Interspeech 2012: Proceedings of the 13th Annual Conference of the International Speech Communication Association. 2012.
www.isca-speech.org/archive/interspeech_2012
.
Deselaers, Thomas, Saša Hasan, Oliver Bender, and Hermann Ney. “A Deep Learning Approach to Machine Transliteration.” In Proceedings of the Fourth Workshop on Statistical Machine Translation, e233–241. Stroudsburg, PA: Association for Computational Linguistics, 2009.
Desjardins, Guillaume, and Yoshua Bengio. “Empirical Evaluation of Convolutional RBMs for Vision.” Technical report, Université de Montréal, 2008.
Erhan, Dumitru, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. “Why Does Unsupervised Pre-Training Help Deep Learning?” Journal of Machine Learning Research 11 (2010): 625–660.
Erhan, Dumitru, Pierre-Antoine Manzagol, Yoshua Bengio, Samy Bengio, and Pascal Vincent. “The Difficulty of Training Deep Architectures and the Effect of Unsupervised Pre-Training.” In
Proceedings of the 12th International Conference on Artificial Intelligence and Statistics, edited by David van Dyk and Max Welling, 153–160. 2009.
http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS09_ErhanMBBV.pdf
.
Farley, B. G., and W. Clark. “Simulation of Self-Organizing Systems by Digital Computer.” IEEE Transactions of the IRE Professional Group on Information Theory 4, no. 4 (1954): 76–84.
Fischer, Asja, and Christian Igel. “An Introduction to Restricted Boltzmann Machines.” In Progress in Pattern Recognition, Image Analysis, Computer Vision, and Applications: Proceedings of the 17th Iberoamerican Congress, CIARP 2012, Buenos Aires, Argentina, September 3–6, 2012, edited by Luis Alvarez, Marta E. Mejail, Luis E. Gomez, and Julio E. Jacobo, 14–36. Berlin: Springer, 2012.
Fukushima, Kunihiko. “Neocognition: A Self-Organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position.” Biological Cybernetics 36 (1980): 193–202.
Glorot, Xavier, Antoine Bordes, and Yoshua Bengio. “Domain Adaptation for Large-Scale Sentiment Classification: A Deep Learning Approach.” In
ICML ’11: Proceedings of the 28th International Conference on Machine Learning, 513–520. 2011.
www.icml-2011.org/papers/342_icmlpaper.pdf
.
Hamel, Philippe, and Douglas Eck. “Learning Features from Music Audio with Deep Belief Networks.” In ISMIR 2010: Proceedings of the 11th International Society for Music Information Retrieval Conference (ISMIR 2010), August 9–13, 2010, Utrecht, the Netherlands, edited by J. Stephen Downie and Rembo C. Veltkamp, 339–344. International Society for Music Information Retrieval, 2010.
http://ismir2010.ismir.net/proceedings/ISMIR2010_complete_proceedings.pdf
.
Hawkins, Jeff, and Sandra Blakeslee. On Intelligence. New York: Macmillan, 2007.
Haykin, Simon. Neural Networks. Upper Saddle River, NJ: Prentice Hall, 1994.
Hebb, Donald. The Organization of Behavior. New York: Wiley, 1949.
Hinton, Geoffrey E. “Training Products of Experts by Minimizing Contrastive Divergence.” Neural Computation 14, no. 8 (2002): 1771–1800.
Hinton, Geoffrey E. “To Recognize Shapes, First Learn to Generate Images.” Progress in Brain Research 165 (2007): 535–547.
Hinton, Geoffrey E.. “A Practical Guide to Training Restricted Boltzmann Machines.” Momentum 9, no. 1 (2010).
Hinton, Geoffrey E., Li Deng, Dong Yu, George E. Dahl, Abdel-rahman Mohamed, Navdeep Jaitly, Andrew Senior, et al. “Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups.” IEEE Signal Processing Magazine 29, no. 6 (2012): 82–97.
Hinton, Geoffrey E., Simon Osindero, and Yee-Whye Teh. “A Fast Learning Algorithm for Deep Belief Nets.” Neural Computation 18, no. 7 (2006): 1527–1554.
Hinton, Geoffrey E., and Ruslan R. Salakhutdinov. “Reducing the Dimensionality of Data with Neural Networks.” Science 313, no. 5786 (2006): 504–507.
Hochreiter, Sepp. “Untersuchungen zu dynamischen neuronalen Netzen.” Master's thesis, Technical University of Munich, 1991.
Hochreiter, Sepp, Yoshua Bengio, Paolo Frasconi, and Jürgen Schmidhuber. “Gradient Flow in Recurrent Nets: The Difficulty of Learning Long-Term Dependencies.” In A Field Guide to Dynamical Recurrent Neural Networks, edited by John F. Kolen and Stefan C. Kremer, 237–244. Piscataway, NJ: Institute of Electrical and Electronics Engineers, 2001.
Hörster, Eva, and Rainer Lienhart. “Deep Networks for Image Retrieval on Large-Scale Databases.” In Proceedings of the 16th ACM International Conference on Multimedia, 643–646. New York: ACM, 2008.
Jain, Anil K., Jianchang Mao, and K. M. Mohiuddin. “Artificial Neural Networks: A Tutorial.” Computer 29, no. 3 (1996): 31–44.
Jaitly, Navdeep, Patrick Nguyen, Andrew W. Senior, and Vincent Vanhoucke. “Application of Pretrained Deep Neural Networks to Large Vocabulary Speech Recognition.” In
Interspeech 2012: Proceedings of the 13th Annual Conference of the International Speech Communication Association. 2012.
www.isca-speech.org/archive/interspeech_2012/
.
Lee, Honglak, Roger Grosse, Rajesh Ranganath, and Andrew Y. Ng. “Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hierarchical Representations.” In ICML ’09: Proceedings of the 26th Annual International Conference on Machine Learning, edited by Léon Bottou and Michael Littman, 609–616. New York: ACM, 2009.
Lo, Charles. “A FPGA Implementation of Large Restricted Boltzmann Machines.” In Proceedings of the 18th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM), May 2–4, 2010, Charlotte, NC, 201–208. Piscataway, NJ: Institute of Electrical and Electronics Engineers, 2010.
Ly, Daniel L., and Paul Chow. “High-Performance Reconfigurable Hardware Architecture for Restricted Boltzmann Machines.” IEEE Transactions on Neural Networks 21, no. 1 (2010): 1780–1792.
McAfee, Lawrence. “Document Classification Using Deep Belief Nets,” 2008.
Mesnil, Grégoire, Yann Dauphin, Xavier Glorot, Salah Rifai, Yoshua Bengio, Ian J. Goodfellow, Erick Lavoie, et al. “Unsupervised and Transfer Learning Challenge: A Deep Learning Approach.” In
ICML 2011: Proceedings of the International Conference on Machine Learning Unsupervised and Transfer Learning Workshop, edited by Isabelle Guyon, Gideon Dror, Vincent Lemaire, Graham Taylor, and Daniel Silver, 97–110. 2012.
http://jmlr.csail.mit.edu/proceedings/papers/v27/mesnil12a/mesnil12a.pdf
.
Mohamed, Abdel-rahman, Tara N. Sainath, George Dahl, Bhuvana Ramabhadran, Geoffrey E. Hinton, and Michael A. Picheny. “Deep Belief Networks Using Discriminative Features for Phone Recognition.” In Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, 5060–5063. Piscataway, NJ: Institute of Electrical and Electronics Engineers, 2011.
Mohamed, Abdel-rahman, Dong Yu, and Li Deng. “Investigation of Full-Sequence Training of Deep Belief Networks for Speech Recognition.” In
Interspeech 2010: Proceedings of 11th Annual Conference of the International Speech Communication Association, edited by Takao Kobayashi, Keikichi Hirose, and Satoshi Nakamura, 2846–2849. 2010.
www.isca-speech.org/archive/interspeech_2010/i10_2846.html
.
Pape, Leo, Faustino Gomez, Mark Ring, and Jürgen Schmidhuber. “Modular Deep Belief Networks That Do Not Forget.” In Proceedings of the 2011 International Joint Conference on Neural Networks, 1191–1198. Piscataway, NJ: Institute of Electrical and Electronics Engineers, 2011.
Poon, Hoifung, and Pedro Domingos. “Sum-Product Networks: A New Deep Architecture.” In Proceedings of the 2011 IEEE International Conference on Computer Vision Workshops, 689–690. Piscataway, NJ: Institute of Electrical and Electronics Engineers, 2011.
Ranzato, Marc’Aurelio, and Martin Szummer. “Semi-Supervised Learning of Compact Document Representations with Deep Networks.” In ICML ’08: Proceedings of the 25th International Conference on Machine Learning, edited by Andrew McCallum and Sam Roweis, 792–799. New York: ACM, 2008.
Rosenblatt, Frank. “The Perceptron: A Probabilistic Model for Information Storage and Organization in the Brain.” Psychological Review 65, no. 6 (1958): 386–408.
Sainath, Tara N., Brian Kingsbury, Bhuvana Ramabhadran, Petr Fousek, Petr Novak, and Abdel-rahman Mohamed. “Making Deep Belief Networks Effective for Large Vocabulary Continuous Speech Recognition.” In Proceedings of the 2011 IEEE Workshop on Automatic Speech Recognition and Understanding, edited by Thomas Hain and Kai Yu, 30–35. Piscataway, NJ: Institute of Electrical and Electronics Engineers, 2011.
Schmidhuber, Jurgen. “Learning Complex, Extended Sequences Using the Principle of History Compression.” Neural Computation 4 (1992): 234–242.
Seide, Frank, Gang Li, and Dong Yu. “Conversational Speech Transcription Using Context-Dependent Deep Neural Networks.” In
Interspeech 2011: Proceedings of 11th Annual Conference of the International Speech Communication Association, edited by Piero Cosi, Renato De Mori, Giuseppe Di Fabbrizio, and Roberto Pieraccini, 437–440. 2011.
www.isca-speech.org/archive/interspeech_2011
.
Smolensky, Paul. “Information Processing in Dynamical Systems: Foundations of Harmony Theory.” In Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1, edited by David E. Rumelhart, James L. McClelland, and the PDP Research Group, 194–281. Cambridge, MA: Massachusetts Institute of Technology Press, 1986.
Susskind, Joshua M., Geoffrey E. Hinton, Javier R. Movellan, and Adam K. Anderson. “Generating Facial Expressions with Deep Belief Nets.” In Affective Computing: Focus on EmotionExpression, Synthesis and Recognition, edited by Jimmy Or, 421–440. Vienna: I-Tech, 2008.
Uetz, Rafael, and Sven Behnke. “Locally-Connected Hierarchical Neural Networks for GPU-Accelerated Object Recongition.” In Proceedings of the NIPS 2009 Workshop on Large-Scale Machine Learning Parallelism and Massive Datasets. 2009.
Werbos, Paul. “Beyond Regression: New Tools for Prediction and Analysis in the Behavioral Sciences.” PhD thesis, Harvard University, 1974.
Weston, Jason, Frédéric Ratle, Hossein Mobahi, and Ronan Collobert. “Deep Learning via Semi-Supervised Embedding.” In Neural Networks: Tricks of the Trade, Second Edition, edited by Grégoire Montavon, Geneviève Orr, and Klaus-Robert Müller, 639–655. Berlin: Springer, 2012.
Wulsin, D. F., J. R. Gupta, R. Mani, J. A. Blanco, and B. Litt. “Modeling Electroencephalography Waveforms with Semi-Supervised Deep Belief Nets: Fast Classification and Anomaly Measurement.” Journal of Neural Engineering 8, no. 3 (2011): 036015.
Zhou, Shusen, Qingcai Chen, and Xiaolong Wang. “Active Deep Networks for Semi-Supervised Sentiment Classification.” In Proceedings of the 23rd International Conference on Computational Linguistics: Posters, edited by Chu-Ren Huang and Dan Jurafsky, 1515–1523. Stroudsburg, PA: Association for Computational Linguistics, 2010.