nach oben

Erschienen in:

2016 | OriginalPaper | Buchkapitel

Automatic Speech Recognition Based on Neural Networks

verfasst von : Ralf Schlüter, Patrick Doetsch, Pavel Golik, Markus Kitza, Tobias Menne, Kazuki Irie, Zoltán Tüske, Albert Zeyer

Erschienen in: Speech and Computer

Verlag: Springer International Publishing

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In automatic speech recognition, as in many areas of machine learning, stochastic modeling relies on neural networks more and more. Both in acoustic and language modeling, neural networks today mark the state of the art for large vocabulary continuous speech recognition, providing huge improvements over former approaches that were solely based on Gaussian mixture hidden markov models and count-based language models. We give an overview of current activities in neural network based modeling for automatic speech recognition. This includes discussions of network topologies and cell types, training and optimization, choice of input features, adaptation and normalization, multitask training, as well as neural network based language modeling. Despite the clear progress obtained with neural network modeling in speech recognition, a lot is to be done, yet to obtain a consistent and self-contained neural network based modeling approach that ties in with the former state of the art. We will conclude by a discussion of open problems as well as potential future directions w.r.t. to neural network integration into automatic speech recognition systems.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Nächstes Kapitel Machine Processing of Dialogue States; Speculations on Conversational Entropy

Abdel-Hamid, O., Mohamed, A., Jiang, H., Penn, G.: Applying convolutional neural networks concepts to hybrid NN-HMM model for speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Kyoto, Japan, pp. 4277–4280, March 2012

Babel: US IARPA Project (2012–2016). http://www.iarpa.gov/Programs/ia/Babel/babel.html

Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations (ICLR), San Diego, CA, USA, May 2015

Bahdanau, D., Chorowski, J., Serdyuk, D., Brakel, P., Bengio, Y.: End-to-End attention-based large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China, pp. 4945–4949, March 2016

Bahdanau, D., Serdyuk, D., Brakel, P., Ke, N.R., Chorowski, J., Courville, A.C., Bengio, Y.: Task loss estimation for sequence prediction. CoRR abs/1511.06456 (2015). http://arxiv.org/abs/1511.06456

Bengio, Y., Ducharme, R., Vincent, P.: A neural probabilistic language model. In: Advances in Neural Information Processing Systems (NIPS), Denver, CO, USA, vol. 13, pp. 932–938, November 2000

Bourlard, H., Wellekens, C.J.: Links between markov models and multilayer perceptrons. In: Touretzky, D. (ed.) Advances in neural information processing systems i, pp. 502–510. Morgan Kaufmann, San Mateo, CA (1989)

Bourlard, H.A., Morgan, N.: Connectionist Speech Recognition: A Hybrid Approach. Kluwer Academic Publishers, Norwell (1993)

Breuel, T.M.: Benchmarking of LSTM Networks. arXiv preprint (2015). arXiv:1508.02774

10.

Bridle, J.S.: Probabilistic interpretation of feedforward classification network outputs with relationships to statistical pattern recognition. In: Soulié, F.F., Hérault, J. (eds.) Neurocomputing: Algorithms, Architectures and Applications. Nato ASI Series F: Computer and Systems Sciences, vol. 68, pp. 227–236. Springer, Heidelberg (1989)

11.

Burget, L., Schwarz, P., Agarwal, M., Akayazi, P., Feng, K., Ghoshal, A., Glembek, O., Goel, N., Karafiát, M., Povey, D., Rastrow, A., Rose, R.C., Thomas, S.: Multilingual acoustic modeling for speech recognition based on subspace gaussian mixture models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 4334–4337 (2010)

12.

Byrne, W., Beyerlein, P., Huerta, J.M., Khudanpur, S., Marthi, B., Morgan, J., Peterek, N., Picone, J., Vergyri, D., Wang, W.: Towards language independent acoustic modeling. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. 1029–1032 (2000)

13.

Caruana, R.: Multitask learning: A knowledge-based source of inductive bias. In: International Conference on Machine Learning (ICML), pp. 41–48 (1993)

14.

Chan, W., Jaitly, N., Le, Q.V., Vinyals, O.: Listen, Attend and Spell. CoRR abs/1508.01211 (2015)

15.

Chen, X., Liu, X., Gales, M., Woodland, P.: Investigation of back-off based interpolation between recurrent neural network and \(N\)-gram language models. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Scottsdale, AZ, USA, pp. 181–186, December 2015

16.

Chung, J., Gülçehre, Ç., Cho, K., Bengio, Y.: Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. CoRR abs/1412.3555 (2014)

17.

Clevert, D., Unterthiner, T., Hochreiter, S.: Fast and accurate deep network learning by exponential linear units (ELUs). In: International Conference on Learning Representations (ICLR), San Juan, Puerto Rico, May 2016

18.

Davis, S., Mermelstein, P.: Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. 28(4), 357–366 (1980)CrossRef

19.

Dean, J., Corrado, G., Monga, R., Chen, K., Devin, M., Mao, M., Ranzato, M.A., Senior, A., Tucker, P., Yang, K., Le, Q.V., Ng, A.Y.: Large scale distributed deep networks. In: Pereira, F., Burges, C.J.C., Bottou, L., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems (NIPS), pp. 1223–1231. Nips Foundation (2012). http://books.nips.cc

20.

Doetsch, P., Zeyer, A., Voigtlaender, P., Kulikov, I., Schlüter, R., Ney, H.: RETURNN: the RWTH extensible training framework for universal recurrent neural networks. In: Interspeech, San Francisco, CA, USA, September 2016, submitted

21.

Duchi, J., Hazan, E., Singer, Y.: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization. Technical Report UCB/EECS-2010-24, EECS Department, University of California, Berkeley, March 2010

22.

Geiger, J.T., Zhang, Z., Weninger, F., Schuller, B., Rigoll, G.: Robust speech recognition using long short-term memory recurrent neural networks for hybrid acoustic modelling. In: Interspeech, pp. 631–635 (2014)

23.

Golik, P., Doetsch, P., Ney, H.: Cross-entropy vs. squared error training: a theoretical and experimental comparison. In: Interspeech, Lyon, France, pp. 1756–1760, August 2013

24.

Golik, P., Tüske, Z., Schlüter, R., Ney, H.: Convolutional neural networks for acoustic modeling of raw time signal in LVCSR. In: Interspeech, pp. 26–30. Dresden, Germany, September 2015

25.

Golik, P., Tüske, Z., Schlüter, R., Ney, H.: Multilingual features based keyword search for very low-resource languages. In: Interspeech, Dresden, Germany, pp. 1260–1264, September 2015

26.

Goodfellow, I.J., Warde-Farley, D., Mirza, M., Courville, A., Bengio, Y.: Maxout networks. In: International Conference on Machine Learning (ICML), Atlanta, GA, USA, June 2013

27.

Graves, A., Mohamed, A.R., Hinton, G.: Speech recognition withdeep recurrent neural networks. In: IEEE International Conference on Acoustics, Speech, and SignalProcessing (ICASSP), pp. 6645–6649. IEEE (2013)

28.

Graves, A.: Generating Sequences with Recurrent Neural Networks. CoRR abs/1308.0850 (2013). http://arxiv.org/abs/1308.0850

29.

Graves, A., Fernández, S., Gomez, F., Schmidhuber, J.: Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In: International Conference on Machine Learning (ICML), NY, USA, pp. 369–376 (2006). http://doi.acm.org/10.1145/1143844.1143891

30.

Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)CrossRef

31.

Greff, K., Srivastava, R.K., Koutník, J., Steunebrink, B.R., Schmidhuber, J.: LSTM: A Search Space Odyssey. arXiv preprint (2015). arXiv:1503.04069

32.

Grézl, F., Karafiát, M., Janda, M.: Study of probabilistic and bottle-neck features in multilingual environment. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 359–364 (2011)

33.

Grézl, F., Karafiát, M., Kontár, S., Černocký, J.: Probabilistic and bottle-neck features for LVCSR of meetings. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Honolulu, HI, USA, pp. 757–760, April 2007

34.

Gülçehre, Ç., Bengio, Y.: ADASECANT: Robust Adaptive Secant Method for Stochastic Gradient. CoRR abs/1412.7419 (2014). http://arxiv.org/abs/1412.7419

35.

He, X., Deng, L., Chou, W.: Discriminative learning in sequential pattern recognition - a unifying review for optimization-oriented speech recognition. IEEE Signal Process. Mag. 25(5), 14–36 (2008)CrossRef

36.

Heigold, G., Schlüter, R., Ney, H., Wiesler, S.: Discriminative training for automatic speech recognition: Modeling, criteria, optimization, implementation, and performance. IEEE Signal Process. Mag. 29(6), 58–69 (2012)CrossRef

37.

Hermansky, H., Morgan, N.: RASTA processing of speech. IEEE Trans. Speech Audio Process. 2(4), 578–589 (1994)CrossRef

38.

Hermansky, H.: Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am. 87(4), 1738–1752 (1990)CrossRef

39.

Hermansky, H., Ellis, D., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Istanbul, Turkey, vol. 3, pp. 1635–1638, June 2000

40.

Heymann, J., Drude, L., Chinaev, A., Häb-Umbach, R.: BLSTM supported GEV beamformer front-end for the 3rd CHiME challenge. In: Automatic Speech Recognition and Understanding Workshop (ASRU), December 2015

41.

Hinton, G.E., Osindero, S., Teh, Y.W.: A fast learning algorithm for deep belief nets. Neural Comput. 18(7), 1527–1554 (2006)MathSciNetCrossRefMATH

42.

Hochreiter, S., Bengio, Y., Frasconi, P., Schmidhuber, J.: Gradient flow in recurrent nets: The difficulty of learning long-term dependencies. In: Kolen, J., Kremer, S. (eds.) A Field Guide to Dynamical Recurrent Networks. IEEE Press, New York (2001)

43.

Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780 (1997)CrossRef

44.

Hornik, K., Stinchcombe, M.B., White, H.: Multilayer feedforward networks are universal approximators. Neural Netw. 2(5), 359–366 (1989)CrossRef

45.

Huang, G., Sun, Y., Liu, Z., Sedra, D., Weinberger, K.: Deep Networks with Stochastic Depth. arXiv preprint (2016). arXiv:1603.09382

46.

Irie, K., Tüske, Z., Alkhouli, T., Schlüter, R., Ney, H.: LSTM, GRU, highway and a bit of attention: an empirical overview for language modeling in speech recognition. In: Interspeech, San Francisco, CA, USA, September 2016, submitted

47.

Jozefowicz, R., Zaremba, W., Sutskever, I.: An empirical exploration of recurrent network architectures. In: International Conference on Machine Learning (ICML), pp. 2342–2350 (2015)

48.

Kingma, D.P., Ba, J.: Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). http://arxiv.org/abs/1412.6980

49.

Kingsbury, B.: Lattice-based optimization of sequence classification criteria for neural-network acoustic modeling. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Taipei, Taiwan, pp. 3761–3764, April 2009

50.

Kingsbury, B., Sainath, T.N., Soltau, H.: Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization. In: Interspeech, Portland, OR, USA, September 2012

51.

LeCun, Y., Boser, B., Denker, J.S., Henderson, D., Howard, R.E., Hubbard, W., Jackel, L.D.: Handwritten digit recognition with a back-propagation network. In: Advances in Neural Information Processing Systems (NIPS), Denver, CO, USA, vol. 2, November 1990

52.

Li, B., Sim, K.C.: comparison of discriminative input and output transformations for speaker adaptation in the hybrid NN/HMM systems. In: Interspeech, Makuhari, Japan, pp. 526–529, September 2010

53.

Lippmann, R.P.: Review of neural networks for speech recognition. Neural Comput. 1(1), 1–38 (1989)CrossRef

54.

Miao, Y., Metze, F.: Distance-aware DNNs for robust speech recognition. In: Interspeech, Dresden, Germany, pp. 761–765, September 2015

55.

Mikolov, T., Karafiát, M., Burget, L., Cernockỳ, J., Khudanpur, S.: Recurrent neural network based language model. In: Interspeech, Makuhari, Japan, pp. 1045–1048, September 2010

56.

Montufar, G.F., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2924–2932 (2014)

57.

Nair, V., Hinton, G.E.: Rectified linear units improve restricted boltzmann machines. In: International Conference on Machine Learning (ICML), Haifa, Israel, pp. 807–814, June 2010

58.

Nakamura, M., Shikano, K.: A study of english word category prediction based on neural networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Glasglow, UK, pp. 731–734, May 1989

59.

Pascanu, R., Mikolov, T., Bengio, Y.: On the difficulty of training recurrent neural networks. arXiv preprint (2012). arxiv:1211.5063

60.

Plahl, C., Kozielski, M., Schlüter, R., Ney, H.: Feature combination and stacking of recurrent and non-recurrent neural networks for LVCSR. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Vancouver, Canada, pp. 6714–6718, May 2013

61.

Plahl, C., Schlüter, R., Ney, H.: Cross-lingual portability of Chinese and English neural network features for French and German LVCSR. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), pp. 371–376 (2011)

62.

Qian, Y., Tan, T., Yu, D., Zhang, Y.: Integrated adaptation with multi-factor joint-learning for far-field speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Shanghai, China, pp. 1–5 (2016)

63.

Robinson, T., Hochberg, M., Renals, S.: IPA: Improved phone modelling with recurrent neural networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. I, pp. 37–40, April 1994

64.

Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning representations by back-propagating errors. Nature 323, 533–536 (1986)CrossRef

65.

Sainath, T.N., Weiss, R.J., Senior, A., Wilson, K.W., Vinyals, O.: Learning the speech front-end with raw waveform CLDNNs. In: Interspeech, pp. 1–5 (2015)

66.

Sak, H., Senior, A., Beaufays, F.: Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In: Interspeech, Singapore, pp. 338–342, September 2014

67.

Saon, G., Soltau, H., Nahamoo, D., Picheny, M.: Speaker adaptation of neural network acoustic models using i-Vectors. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Olomouc, Czech Republic, pp. 55–59, December 2013

68.

Scanzio, S., Laface, P., Fissore, L., Gemello, R., Mana, F.: On the use of a multilingual neural network front-end. In: Interspeech, pp. 2711–2714 (2008)

69.

Schaaf, T., Metze, F.: Analysis of gender normalization using MLP and VTLN features. In: Interspeech, pp. 306–309 (2010)

70.

Schlüter, R., Bezrukov, I., Wagner, H., Ney, H.: Gammatone features and feature combination for large vocabulary speech recognition. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 649–652 (2007)

71.

Schultz, T., Waibel, A.: Fast bootstrapping Of LVCSR systems with multilingual phoneme sets. In: European Conference on Speech Communication and Technology (Eurospeech) (1997)

72.

Seide, F., Li, G., Chen, X., Yu, D.: Feature engineering in context-dependent deep neural networks for conversational speech transcription. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Waikoloa, HI, USA, pp. 24–29, December 2011

73.

Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Interspeech, Florence, Italy, pp. 437–440, August 2011

74.

Sonoda, S., Murata, N.: Neural network with unbounded activation functions is universal approximator. Appl. Comput. Harmonic Anal. (2016, in Press), Corrected Proof, Available online 17 December 2015

75.

Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R.: Dropout: a simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 15(1), 1929–1958 (2014)MathSciNetMATH

76.

Srivastava, R.K., Greff, K., Schmidhuber, J.: Training very deep networks. In: Advances in Neural Information Processing Systems (NIPS), pp. 2368–2376 (2015)

77.

Stolcke, A., Grézl, F., Hwang, M.Y., Lei, X., Morgan, N., Vergyri, D.: Cross-domain and cross-language portability of acoustic features estimated by multilayer perceptrons. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 321–324 (2006)

78.

Sundermeyer, M., Ney, H., Schlüter, R.: From feedforward to recurrent LSTM neural networks for language modeling. IEEE/ACM Trans. Audio Speech Lang. Process. 23(3), 517–529 (2015)CrossRef

79.

Sundermeyer, M., Schlüter, R., Ney, H.: LSTM neural networks for language modeling. In: Interspeech, Portland, OR, USA, pp. 194–197, September 2012

80.

Sundermeyer, M., Tüske, Z., Schlüter, R., Ney, H.: Lattice decoding and rescoring with long-span neural network language models. In: Interspeech, Singapore, pp. 661–665, September 2014

81.

Thomas, S., Ganapathy, S., Hermansky, H.: Cross-lingual and multistream posterior features for low resource LVCSR systems. In: Interspeech, pp. 877–880 (2010)

82.

Tóth, L., Frankel, J., Gosztolya, G., King, S.: Cross-lingual portability of MLP-based tandem features-a case study for English and Hungarian. In: Interspeech, pp. 2695–2698 (2008)

83.

Tüske, Z., Golik, P., Nolden, D., Schlüter, R., Ney, H.: Data augmentation, feature combination, and multilingual neural networks to improve ASR and KWS performance for low-resource languages. In: Interspeech, Singapore, pp. 1420–1424, September 2014

84.

Tüske, Z., Golik, P., Schlüter, R., Ney, H.: Acoustic modeling with deep neural networks using raw time signal for LVCSR. In: Interspeech, Singapore, pp. 890–894, September 2014

85.

Tüske, Z., Golik, P., Schlüter, R., Ney, H.: Speaker adaptive joint training of gaussian mixture models and bottleneck features. In: IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Scottsdale, AZ, USA, pp. 596–603, December 2015

86.

Tüske, Z., Irie, K., Schlüter, R., Ney, H.: Investigation on log-linear interpolation of multi-domain neural network language model. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), pp. 6005–6009, Shanghai, China, March 2016

87.

Tüske, Z., Nolden, D., Schlüter, R., Ney, H.: Multilingual MRASTA features for low-resource keyword search and speech recognition systems. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) (2014)

88.

Tüske, Z., Schlüter, R., Ney, H.: Multilingual hierarchical MRASTA features for ASR. In: Interspeech, pp. 2222–2226, Lyon, France, August 2013

89.

Tüske, Z., Sundermeyer, M., Schlüter, R., Ney, H.: Context-dependent MLPs for LVCSR: TANDEM, hybrid or both? In: Interspeech, Portland, OR, USA, pp. 18–21, September 2012

90.

Tüske, Z., Tahir, M.A., Schlüter, R., Ney, H.: Integrating gaussian mixtures into deep neural networks: Softmax layer with hidden variables. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Australia, pp. 4285–4289, April 2015

91.

Valente, F., Vepa, J., Plahl, C., Gollan, C., Hermansky, H., Schlüter, R.: Hierarchical neural networks feature extraction for LVCSR system. In: Interspeech, Antwerp, Belgium, pp. 42–45, August 2007

92.

Waibel, A., Hanazawa, T., Hinton, G., Shikano, K., Lang, K.: Phoneme recognition: neural networks vs. hidden markov models. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 1, pp. 107–110, April 1989

93.

Wiesler, S., Golik, P., Schlüter, R., Ney, H.: Investigations on sequence training of neural networks. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Brisbane, Australia, pp. 4565–4569, April 2015

94.

Wiesler, S., Li, J., Xue, J.: Investigations on hessian-free optimization for cross-entropy training of deep neural networks. In: Interspeech, Lyon, France, pp. 3317–3321, August 2013

95.

Wiesler, S., Richard, A., Schlüter, R., Ney, H.: Mean-normalized stochastic gradient for large-scale deep learning. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, pp. 180–184, May 2014

96.

Xue, J., Li, J., Yu, D., Seltzer, M., Gong, Y.: Singular value decomposition based low-footprint speaker adaptation and personalization for deep neural network. In: IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), Florence, Italy, pp. 6359–6363, May 2014

97.

Zeiler, M.D.: ADADELTA: An Adaptive Learning Rate Method. CoRR abs/1212.5701 (2012)

98.

Zeyer, A., Doetsch, P., Voigtlaender, P., Schlüter, R., Ney, H.: A comprehensive study of deep bidirectional LSTM RNNs for acoustic modeling in speech recognition. In: Interspeech, San Francisco. CA, USA, September 2016, submitted

99.

Zeyer, A., Schlüter, R., Ney, H.: Towards online-recognition with deep bidirectional LSTM acoustic models. In: Interspeech, San Francisco, CA, USA, September 2016, submitted

100.

Zhang, Y., Chen, G., Yu, D., Yao, K., Khudanpur, S., Glass, J.: Highway Long Short-Term Memory RNNs for Distant Speech Recognition. arXiv preprint (2015). arxiv:1510.08983

Titel: Automatic Speech Recognition Based on Neural Networks
verfasst von: Ralf Schlüter
Patrick Doetsch
Pavel Golik
Markus Kitza
Tobias Menne
Kazuki Irie
Zoltán Tüske
Albert Zeyer
Verlag: Springer International Publishing
Buch: Speech and Computer
Print ISBN: 978-3-319-43957-0

Electronic ISBN: 978-3-319-43958-7

Copyright-Jahr: 2016
DOI: https://doi.org/10.1007/978-3-319-43958-7_1

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Premium Partner