Skip to main content

2015 | OriginalPaper | Buchkapitel

7. Training and Decoding Speedup

verfasst von : Dong Yu, Li Deng

Erschienen in: Automatic Speech Recognition

Verlag: Springer London

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config
loading …

Abstract

Deep neural networks (DNNs) have many hidden layers each of which has many neurons. This greatly increases the total number of parameters in the model and slows down both the training and decoding. In this chapter, we discuss algorithms and engineering techniques that speedup the training and decoding. Specifically, we describe the parallel training algorithms such as pipelined backpropagation algorithm, asynchronous stochastic gradient descend algorithm, and augmented Lagrange multiplier algorithm. We also introduce model size reduction techniques based on low-rank approximation which can speedup both training and decoding, and techniques such as quantization, lazy evaluation, and frame skipping that significantly speedup the decoding.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

  • über 102.000 Bücher
  • über 537 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Maschinenbau + Werkstoffe
  • Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 390 Zeitschriften

aus folgenden Fachgebieten:

  • Automobil + Motoren
  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Elektrotechnik + Elektronik
  • Energie + Nachhaltigkeit
  • Maschinenbau + Werkstoffe




 

Jetzt Wissensvorsprung sichern!

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

  • über 67.000 Bücher
  • über 340 Zeitschriften

aus folgenden Fachgebieten:

  • Bauwesen + Immobilien
  • Business IT + Informatik
  • Finance + Banking
  • Management + Führung
  • Marketing + Vertrieb
  • Versicherung + Risiko




Jetzt Wissensvorsprung sichern!

Fußnoten
1
More precisely, it can be approximated by the magnitudes of the product of the weights and the input values. However, the magnitude of the input values are relatively uniform within each layer since on the input layer, features are normalized to zero-mean and unit-variance, and hidden layer values are probabilities.
 
Literatur
2.
Zurück zum Zitat Bertsekas, D.P.: Constrained optimization and lagrange multiplier methods. Computer Science and Applied Mathematics, vol. 1982, p. 1. Academic Press, Boston (1982) Bertsekas, D.P.: Constrained optimization and lagrange multiplier methods. Computer Science and Applied Mathematics, vol. 1982, p. 1. Academic Press, Boston (1982)
3.
Zurück zum Zitat Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends\({\textregistered }\) Mach. Learn. 3(1), 1–122 (2011) Boyd, S., Parikh, N., Chu, E., Peleato, B., Eckstein, J.: Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends\({\textregistered }\) Mach. Learn. 3(1), 1–122 (2011)
4.
Zurück zum Zitat Bucilua, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 535–541. ACM (2006) Bucilua, C., Caruana, R., Niculescu-Mizil, A.: Model compression. In: Proceedings of the International Conference on Knowledge Discovery and Data Mining (SIGKDD), pp. 535–541. ACM (2006)
5.
Zurück zum Zitat Chen, X., Eversole, A., Li, G., Yu, D., Seide, F.: Pipelined back-propagation for context-dependent deep neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012) Chen, X., Eversole, A., Li, G., Yu, D., Seide, F.: Pipelined back-propagation for context-dependent deep neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)
6.
Zurück zum Zitat Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)CrossRef Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)CrossRef
7.
Zurück zum Zitat Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef Dean, J., Ghemawat, S.: Mapreduce: simplified data processing on large clusters. Commun. ACM 51(1), 107–113 (2008)CrossRef
8.
Zurück zum Zitat Hassibi, B., Stork, D.G., et al.: Second order derivatives for network pruning: optimal brain surgeon. Proc. Neural Inf. Process. Syst. (NIPS) 5, 164–164 (1993) Hassibi, B., Stork, D.G., et al.: Second order derivatives for network pruning: optimal brain surgeon. Proc. Neural Inf. Process. Syst. (NIPS) 5, 164–164 (1993)
10.
Zurück zum Zitat Kingsbury, B. and Sainath, T.N., Soltau, H.: Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization (INTERSPEECH) (2012) Kingsbury, B. and Sainath, T.N., Soltau, H.: Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization (INTERSPEECH) (2012)
11.
Zurück zum Zitat Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. J. Mach. Learn. Res. (JMLR) 10, 777–801 (2009)MATHMathSciNet Langford, J., Li, L., Zhang, T.: Sparse online learning via truncated gradient. J. Mach. Learn. Res. (JMLR) 10, 777–801 (2009)MATHMathSciNet
12.
Zurück zum Zitat Le, Q.V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean, J., Ng, A.Y.: Building high-level features using large scale unsupervised learning. arXiv preprint arXiv:1112.6209 (2011) Le, Q.V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean, J., Ng, A.Y.: Building high-level features using large scale unsupervised learning. arXiv preprint arXiv:​1112.​6209 (2011)
13.
Zurück zum Zitat LeCun, Y., Denker, J.S., Solla, S.A., Howard, R.E., Jackel, L.D.: Optimal brain damage. Proc. Neural Inf. Process. Syst. (NIPS) 2, 598–605 (1989) LeCun, Y., Denker, J.S., Solla, S.A., Howard, R.E., Jackel, L.D.: Optimal brain damage. Proc. Neural Inf. Process. Syst. (NIPS) 2, 598–605 (1989)
14.
Zurück zum Zitat Li, J., Zhao, R., Huang, J.T., Gong, Y.: Learning small-size DNN with output-distribution-based criteria. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014) Li, J., Zhao, R., Huang, J.T., Gong, Y.: Learning small-size DNN with output-distribution-based criteria. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2014)
15.
Zurück zum Zitat Martens, J.: Deep learning via Hessian-free optimization. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 735–742 (2010) Martens, J.: Deep learning via Hessian-free optimization. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 735–742 (2010)
16.
Zurück zum Zitat Martens, J., Sutskever, I.: Learning recurrent neural networks with Hessian-free optimization. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 1033–1040 (2011) Martens, J., Sutskever, I.: Learning recurrent neural networks with Hessian-free optimization. In: Proceedings of the International Conference on Machine Learning (ICML), pp. 1033–1040 (2011)
17.
Zurück zum Zitat Niu, F., Recht, B., Ré, C., Wright, S.J.: Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. arXiv preprint arXiv:1106.5730 (2011) Niu, F., Recht, B., Ré, C., Wright, S.J.: Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. arXiv preprint arXiv:​1106.​5730 (2011)
18.
Zurück zum Zitat Petrowski, A., Dreyfus, G., Girault, C.: Performance analysis of a pipelined backpropagation parallel algorithm. IEEE Trans. Neural Netw. 4(6), 970–981 (1993)CrossRef Petrowski, A., Dreyfus, G., Girault, C.: Performance analysis of a pipelined backpropagation parallel algorithm. IEEE Trans. Neural Netw. 4(6), 970–981 (1993)CrossRef
19.
Zurück zum Zitat Powell, M.J.: A method for non-linear constraints in minimization problems. UKAEA (1967) Powell, M.J.: A method for non-linear constraints in minimization problems. UKAEA (1967)
20.
Zurück zum Zitat Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6655–6659 (2013) Sainath, T.N., Kingsbury, B., Sindhwani, V., Arisoy, E., Ramabhadran, B.: Low-rank matrix factorization for deep neural network training with high-dimensional output targets. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6655–6659 (2013)
21.
Zurück zum Zitat Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011) Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011)
22.
Zurück zum Zitat Vanhoucke, V., Devin, M., Heigold, G.: Multiframe deep neural networks for acoustic modeling. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 7582–7585. IEEE (2013) Vanhoucke, V., Devin, M., Heigold, G.: Multiframe deep neural networks for acoustic modeling. In: Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on, pp. 7582–7585. IEEE (2013)
23.
Zurück zum Zitat Vanhoucke, V., Senior, A., Mao, M.Z.: Improving the speed of neural networks on CPUs. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011) Vanhoucke, V., Senior, A., Mao, M.Z.: Improving the speed of neural networks on CPUs. In: Proceedings of the NIPS Workshop on Deep Learning and Unsupervised Feature Learning (2011)
24.
Zurück zum Zitat Xue, J., Li, J., Gong, Y.: Restructuring of deep neural network acoustic models with singular value decomposition. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2013) Xue, J., Li, J., Gong, Y.: Restructuring of deep neural network acoustic models with singular value decomposition. In: Proceedings of the Annual Conference of International Speech Communication Association (INTERSPEECH) (2013)
25.
Zurück zum Zitat Yu, D., Seide, F., G.Li, Deng, L.: Exploiting sparseness in deep neural networks for large vocabulary speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4409–4412 (2012) Yu, D., Seide, F., G.Li, Deng, L.: Exploiting sparseness in deep neural networks for large vocabulary speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4409–4412 (2012)
26.
Zurück zum Zitat Zhang, S., Zhang, C., You, Z., Zheng, R., Xu, B.: Asynchronous stochastic gradient descent for DNN training. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6660–6663 (2013) Zhang, S., Zhang, C., You, Z., Zheng, R., Xu, B.: Asynchronous stochastic gradient descent for DNN training. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6660–6663 (2013)
27.
Zurück zum Zitat Zhou, P., Liu, C., Liu, Q., Dai, L., Jiang, H.: A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6650–6654 (2013) Zhou, P., Liu, C., Liu, Q., Dai, L., Jiang, H.: A cluster-based multiple deep neural networks method for large vocabulary continuous speech recognition. In: Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6650–6654 (2013)
Metadaten
Titel
Training and Decoding Speedup
verfasst von
Dong Yu
Li Deng
Copyright-Jahr
2015
Verlag
Springer London
DOI
https://doi.org/10.1007/978-1-4471-5779-3_7

Neuer Inhalt