Skip to main content

2015 | OriginalPaper | Buchkapitel

4. Deep Neural Networks

verfasst von : Dong Yu, Li Deng

Erschienen in: Automatic Speech Recognition

Verlag: Springer London

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

In this chapter, we introduce deep neural networks (DNNs)—multilayer perceptrons with many hidden layers. DNNs play an important role in the modern speech recognition systems, and are the focus of the rest of the book. We depict the architecture of DNNs, describe the popular activation functions and training criteria, illustrate the famous backpropagation algorithm for learning DNN model parameters, and introduce practical tricks that make the training process robust.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
The term deep neural network first appeared in [21] in the context of speech recognition, but was coined in [5] which converted the term deep belief network in the earlier studies into the more appropriate term of deep neural network [4, 17, 24]. The term deep neural network was originally introduced to mean multilayer perceptrons with many hidden layers, but was later extended to mean any neural network with a deep structure.
 
2
The output of the sigmoid function can be very close to 0 but cannot reach 0, while the output of the ReLU function can be exactly 0.
 
3
Although the name backpropagation was coined in 1986 [19] the algorithm itself can be traced back at least to 1969 [3] as a multistage dynamic system optimization method.
 
4
In practice, we have found out that we may achieve slightly better result if we only use momentum after the first epoch.
 
Literatur
1.
Zurück zum Zitat Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, pp. 437–478. Springer (2012) Bengio, Y.: Practical recommendations for gradient-based training of deep architectures. In: Neural Networks: Tricks of the Trade, pp. 437–478. Springer (2012)
2.
Zurück zum Zitat Bottou, L.: Online learning and stochastic approximations. On-line Learn. Neural Netw. 17, 9 (1998) Bottou, L.: Online learning and stochastic approximations. On-line Learn. Neural Netw. 17, 9 (1998)
3.
Zurück zum Zitat Bryson, E.A., Ho, Y.C.: Applied Optimal Control: Optimization, Estimation, and Control. Blaisdell Publishing Company, US (1969) Bryson, E.A., Ho, Y.C.: Applied Optimal Control: Optimization, Estimation, and Control. Blaisdell Publishing Company, US (1969)
4.
Zurück zum Zitat Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4688–4691 (2011) Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4688–4691 (2011)
5.
Zurück zum Zitat Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)CrossRef Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)CrossRef
6.
Zurück zum Zitat Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. (JMLR) 2121–2159 (2011) Duchi, J., Hazan, E., Singer, Y.: Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res. (JMLR) 2121–2159 (2011)
7.
Zurück zum Zitat Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier networks, pp. 315–323 (2011) Glorot, X., Bordes, A., Bengio, Y.: Deep sparse rectifier networks, pp. 315–323 (2011)
8.
Zurück zum Zitat Guenter, B., Yu, D., Eversole, A., Kuchaiev, O., Seltzer, M.L.: “Stochastic gradient descent algorithm in the computational network toolkit”, OPT2013: NIPS 2013 Workshop on Optimization for Machine Learning (2013) Guenter, B., Yu, D., Eversole, A., Kuchaiev, O., Seltzer, M.L.: “Stochastic gradient descent algorithm in the computational network toolkit”, OPT2013: NIPS 2013 Workshop on Optimization for Machine Learning (2013)
9.
Zurück zum Zitat Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems (1952) Hestenes, M.R., Stiefel, E.: Methods of conjugate gradients for solving linear systems (1952)
10.
Zurück zum Zitat Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:1207.0580 (2012) Hinton, G.E., Srivastava, N., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.R.: Improving neural networks by preventing co-adaptation of feature detectors. arXiv preprint arXiv:​1207.​0580 (2012)
11.
Zurück zum Zitat Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012) Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)
12.
Zurück zum Zitat Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)CrossRef Hornik, K., Stinchcombe, M., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)CrossRef
13.
14.
Zurück zum Zitat LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop. Neural Networks: Tricks of The Trade, pp. 9–50. Springer, Berlin (1998)CrossRef LeCun, Y., Bottou, L., Orr, G.B., Müller, K.R.: Efficient backprop. Neural Networks: Tricks of The Trade, pp. 9–50. Springer, Berlin (1998)CrossRef
15.
Zurück zum Zitat Liu, F.H., Stern, R.M., Huang, X., Acero, A.: Efficient cepstral normalization for robust speech recognition. In: Proceedings of ACL Workshop on Human Language Technologies (ACL-HLT), pp. 69–74 (1993) Liu, F.H., Stern, R.M., Huang, X., Acero, A.: Efficient cepstral normalization for robust speech recognition. In: Proceedings of ACL Workshop on Human Language Technologies (ACL-HLT), pp. 69–74 (1993)
16.
Zurück zum Zitat Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)CrossRefMATHMathSciNet Liu, D.C., Nocedal, J.: On the limited memory BFGS method for large scale optimization. Math. Program. 45(1–3), 503–528 (1989)CrossRefMATHMathSciNet
17.
Zurück zum Zitat Mohamed, A., Dahl, G.E., Hinton, G.E.: Deep belief networks for phone recognition. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications (2009) Mohamed, A., Dahl, G.E., Hinton, G.E.: Deep belief networks for phone recognition. In: NIPS Workshop on Deep Learning for Speech Recognition and Related Applications (2009)
18.
Zurück zum Zitat Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2). Sov. Math. Dokl. 27, 372–376 (1983)MATH Nesterov, Y.: A method of solving a convex programming problem with convergence rate O (1/k2). Sov. Math. Dokl. 27, 372–376 (1983)MATH
19.
Zurück zum Zitat Rumelhart, D.E., Hintont, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRef Rumelhart, D.E., Hintont, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323(6088), 533–536 (1986)CrossRef
20.
Zurück zum Zitat Seide, F., Fu, H., Droppo, J., Li, G., Yu, D.: On parallelizability of stochastic gradient descent for speech dnns. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014) Seide, F., Fu, H., Droppo, J., Li, G., Yu, D.: On parallelizability of stochastic gradient descent for speech dnns. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)
21.
Zurück zum Zitat Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011) Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011)
22.
Zurück zum Zitat Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. arXiv preprint arXiv:1206.2944 (2012) Snoek, J., Larochelle, H., Adams, R.P.: Practical Bayesian optimization of machine learning algorithms. arXiv preprint arXiv:​1206.​2944 (2012)
23.
Zurück zum Zitat Wang, S., Manning, C.: Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 118–126 (2013) Wang, S., Manning, C.: Fast dropout training. In: Proceedings of the 30th International Conference on Machine Learning (ICML-13), pp. 118–126 (2013)
24.
Zurück zum Zitat Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: Proceedings of Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning (2010) Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: Proceedings of Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning (2010)
Metadaten
Titel
Deep Neural Networks
verfasst von
Dong Yu
Li Deng
Copyright-Jahr
2015
Verlag
Springer London
DOI
https://doi.org/10.1007/978-1-4471-5779-3_4

Neuer Inhalt