nach oben

Erschienen in:

2015 | OriginalPaper | Buchkapitel

6. Deep Neural Network-Hidden Markov Model Hybrid Systems

verfasst von : Dong Yu, Li Deng

Erschienen in: Automatic Speech Recognition

Verlag: Springer London

Einloggen

Aktivieren Sie unsere intelligente Suche, um passende Fachinhalte oder Patente zu finden.

search-config

KI-gestützte Suche

Aus

Abstract

In this chapter, we describe one of the several possible ways of exploiting deep neural networks (DNNs) in automatic speech recognition systems—the deep neural network-hidden Markov model (DNN-HMM) hybrid system. The DNN-HMM hybrid system takes advantage of DNN’s strong representation learning power and HMM’s sequential modeling ability, and outperforms conventional Gaussian mixture model (GMM)-HMM systems significantly on many large vocabulary continuous speech recognition tasks. We describe the architecture and the training procedure of the DNN-HMM hybrid system and point out the key components of such systems by comparing a range of system setups.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Online-Abonnement

Mit Springer Professional "Wirtschaft+Technik" erhalten Sie Zugriff auf:

über 102.000 Bücher
über 537 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Finance + Banking
Management + Führung
Marketing + Vertrieb
Maschinenbau + Werkstoffe
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Technik"

Online-Abonnement

Mit Springer Professional "Technik" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 390 Zeitschriften

aus folgenden Fachgebieten:

Automobil + Motoren
Bauwesen + Immobilien
Business IT + Informatik
Elektrotechnik + Elektronik
Energie + Nachhaltigkeit
Maschinenbau + Werkstoffe

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Springer Professional "Wirtschaft"

Online-Abonnement

Mit Springer Professional "Wirtschaft" erhalten Sie Zugriff auf:

über 67.000 Bücher
über 340 Zeitschriften

aus folgenden Fachgebieten:

Bauwesen + Immobilien
Business IT + Informatik
Finance + Banking
Management + Führung
Marketing + Vertrieb
Versicherung + Risiko

Jetzt Wissensvorsprung sichern!

Jetzt informieren

Vorheriges Kapitel Advanced Model Initialization Techniques

Nächstes Kapitel Training and Decoding Speedup

For the desired segmental model, this duration model is very rough.

The independence assumption made in the HMM is one of the reasons why language model weighting is needed. Assuming one doubles the features by extracting a feature for each 5 ms instead of 10 ms, the acoustic model score will be doubled and so the language model weight will also need to be doubled.

Unfair comparison was conducted in several papers that compare the hybrid DNN/HMM system and the KL-HMM system. The conclusions in these papers are thus questionable.

Aradilla, G., Bourlard, H., Magimai-Doss, M.: Using KL-based acoustic models in a large vocabulary recognition task. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 928–931 (2008)

Aradilla, G., Vepa, J., Bourlard, H.: An acoustic model based on kullback-leibler divergence for posterior features. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 4, pp. IV–657 (2007)

Bahl, L., Brown, P., De Souza, P., Mercer, R.: Maximum mutual information estimation of hidden markov model parameters for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 11, pp. 49–52 (1986)

Bourlard, H., Morgan, N., Wooters, C., Renals, S.: CDNN: a context dependent neural network for continuous speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 349–352 (1992)

Bourlard, H., Wellekens, C.J.: Links between Markov models and multilayer perceptrons. IEEE Trans. Pattern Anal. Mach. Intell. (PAMI) 12(12), 1167–1178 (1990)CrossRef

Dahl, G.E., Yu, D., Deng, L., Acero, A.: Large vocabulary continuous speech recognition with context-dependent DBN-HMMs. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4688–4691 (2011)

Dahl, G.E., Yu, D., Deng, L., Acero, A.: Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition. IEEE Trans. Audio, Speech Lang. Process. 20(1), 30–42 (2012)CrossRef

Godfrey, J.J., Holliman, E.: Switchboard-1 Release 2. Linguistic Data Consortium, Philadelphia (1997)

Godfrey, J.J., Holliman, E.C., McDaniel, J.: Switchboard: telephone speech corpus for research and development. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 517–520 (1992)

10.

Hennebert, J., Ris, C., Bourlard, H., Renals, S., Morgan, N.: Estimation of global posteriors and forward-backward training of hybrid hmm/ann systems (1997)

11.

Hermansky, H., Ellis, D.P., Sharma, S.: Tandem connectionist feature extraction for conventional HMM systems. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 3, pp. 1635–1638 (2000)

12.

Hinton, G., Deng, L., Yu, D., Dahl, G.E., Mohamed, A.R., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)

13.

Hwang, M., Huang, X.: Shared-distribution hidden Markov models for speech recognition. IEEE Trans. Speech Audio Process. 1(4), 414–420 (1993)

14.

Kapadia, S., Valtchev, V., Young, S.: MMI training for continuous phoneme recognition on the TIMIT database. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 2, pp. 491–494 (1993)

15.

Kingsbury, B., Sainath, T.N., Soltau, H.: Scalable minimum bayes risk training of deep neural network acoustic models using distributed hessian-free optimization. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH) (2012)

16.

Kumar, N., Andreou, A.G.: Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition. Speech Commun. 26(4), 283–297 (1998)CrossRef

17.

Morgan, N., Bourlard, H.: Continuous speech recognition using multilayer perceptrons with hidden Markov models. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 413–416 (1990)

18.

Morgan, N., Bourlard, H.A.: Neural networks for statistical recognition of continuous speech. Proc. IEEE 83(5), 742–772 (1995)CrossRef

19.

Ostendorf, M., Digalakis, V.V., Kimball, O.A.: From HMM’s to segment models: a unified view of stochastic modeling for speech recognition. IEEE Trans. Speech Audio Process. 4(5), 360–378 (1996)CrossRef

20.

Povey, D.: Discriminative Training for Large Vocabulary Speech Recognition. Ph.D. thesis, Cambridge University Engineering Department, Cambridge (2003)

21.

Povey, D., Kanevsky, D., Kingsbury, B., Ramabhadran, B., Saon, G., Visweswariah, K.: Boosted MMI for model and feature-space discriminative training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 4057–4060 (2008)

22.

Povey, D., Kingsbury, B., Mangu, L., Saon, G., Soltau, H., Zweig, G.: FMPE: discriminatively trained features for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. 961–964 (2005)

23.

Povey, D., Woodland, P.C.: Minimum phone error and I-smoothing for improved discriminative training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1, pp. I-105 (2002)

24.

Robinson, A.J., Cook, G., Ellis, D.P., Fosler-Lussier, E., Renals, S., Williams, D.: Connectionist speech recognition of broadcast news. Speech Commun. 37(1), 27–45 (2002)CrossRefMATH

25.

Seide, F., Li, G., Yu, D.: Conversational speech transcription using context-dependent deep neural networks. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 437–440 (2011)

26.

Senior, A., Heigold, G., Bacchiani, M., Liao, H.: GMM-free DNN training. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2014)

27.

Su, H., Li, G., Yu, D., Seide, F.: Error back propagation for sequence training of context-dependent deep networks for conversational speech transcription. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP) (2013)

28.

Trentin, E., Gori, M.: A survey of hybrid ANN/HMM models for automatic speech recognition. Neurocomputing 37(1), 91–126 (2001)CrossRefMATH

29.

Yu, D., Deng, L., Dahl, G.: Roles of pre-training and fine-tuning in context-dependent DBN-HMMs for real-world speech recognition. In: Proceedings of Neural Information Processing Systems (NIPS) Workshop on Deep Learning and Unsupervised Feature Learning (2010)

30.

Yu, D., Ju, Y.C., Wang, Y.Y., Zweig, G., Acero, A.: Automated directory assistance system-from theory to practice. In: Proceedings of Annual Conference of International Speech Communication Association (INTERSPEECH), pp. 2709–2712 (2007)

31.

Zhang, B., Matsoukas, S., Schwartz, R.: Discriminatively trained region dependent feature transforms for speech recognition. In: Proceedings of International Conference on Acoustics, Speech and Signal Processing (ICASSP), vol. 1,pp. I–I (2006)

32.

Zhu, Q., Chen, B., Morgan, N., Stolcke, A.: Tandem connectionist feature extraction for conversational speech recognition. In: Machine Learning for Multimodal Interaction, vol. 3361, pp. 223–231. Springer, Berlin (2005)

Titel: Deep Neural Network-Hidden Markov Model Hybrid Systems
verfasst von: Dong Yu
Li Deng
Verlag: Springer London
Buch: Automatic Speech Recognition
Print ISBN: 978-1-4471-5778-6

Electronic ISBN: 978-1-4471-5779-3

Copyright-Jahr: 2015
DOI: https://doi.org/10.1007/978-1-4471-5779-3_6

Neuer Inhalt

Bildnachweise

Smart-Manufacturing Dashboard Banner/© AdobeStock_583269095, VDI-Icon, Profil Icon, inhalt2, Springer Professional Modul/© Springer Fachmedien Wiesbaden GmbH, Nachhaltigkeitsaward Key Visual/© Cometis AG/Global ESG Monitor | Daniel Rupp | Generiert mit KI, Search Icon, Banner Hanser, Leads Kundenakquise/© Andrey Popov / stock.adobe.com, Schiffschraube/© Angelika Bentin | stock.adobe.com, Rudergelenkwelle/© Weicon GmbH & Co. KG, Digitalisierung im Marketing/© Fotolia/alphaspirit, Zeitschrift Wissensmanagement Cover, PatentFit-Logo/© Springer Fachmedien Wiesbaden GmbH, Sustainibility Finance/© Robert Kneschke / stock.adobe.com / Springer Fachmedien Wiesbaden GmbH, Zukunftswerkstatt Sales Excellence 2024/© AndreyPopov / Getty Images / iStock, 2023_Antrieb/© supervisuell

Springer Professional

Abstract

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Sie haben noch keine Lizenz? Dann Informieren Sie sich jetzt über unsere Produkte:

Springer Professional "Wirtschaft+Technik"

Springer Professional "Technik"

Springer Professional "Wirtschaft"

Neuer Inhalt

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.

Bitte loggen Sie sich ein, um Zugang zu Ihrer Lizenz zu erhalten.